Supervised learning models are applicable in many fields of science and technology, such as economics, engineering and medicine. Among supervised learning algorithms, there are the so-called Support Vector Machines (SVMs), exhibiting accurate solutions and low training time. They are based on the statistical learning theory and provide the solution by minimizing a quadratic type cost function. SVMs, in conjunction with the use of kernel methods, provide nonlinear classification models, namely separations that cannot be expressed using inequalities on linear combinations of parameters. There are some issues that may reduce the effectiveness of these methods. For example, in multi-center clinical trials, experts from different institutions collect data on many patients. In this case, techniques currently in use determine the model considering all the available data. Although they are well suited to cases under consideration, they do not provide accurate answers in general. Therefore, it is necessary to identify a subset of the training set which contains all available information, providing a model that still generalizes to new testing data. It is also possible that the training sets vary over time, for example, because data are added and modified as a result of new tests or new knowledge. In this case, the current techniques are not able to capture the changes, but need to start the learning process from the beginning. The techniques, which extract only the new knowledge contained in the data and provide the learning model in an incremental way, have the advantage of taking into account only the really useful experiments and speed up the analysis. In this paper, we describe some solutions to these problems, with the support of numerical experiments on the discrimination among differ types of leukemia. © Springer Science+Business Media New York 2012.

Mathematical models of supervised learning and application to medical diagnosis

Guarracino Mario Rosario
2012

Abstract

Supervised learning models are applicable in many fields of science and technology, such as economics, engineering and medicine. Among supervised learning algorithms, there are the so-called Support Vector Machines (SVMs), exhibiting accurate solutions and low training time. They are based on the statistical learning theory and provide the solution by minimizing a quadratic type cost function. SVMs, in conjunction with the use of kernel methods, provide nonlinear classification models, namely separations that cannot be expressed using inequalities on linear combinations of parameters. There are some issues that may reduce the effectiveness of these methods. For example, in multi-center clinical trials, experts from different institutions collect data on many patients. In this case, techniques currently in use determine the model considering all the available data. Although they are well suited to cases under consideration, they do not provide accurate answers in general. Therefore, it is necessary to identify a subset of the training set which contains all available information, providing a model that still generalizes to new testing data. It is also possible that the training sets vary over time, for example, because data are added and modified as a result of new tests or new knowledge. In this case, the current techniques are not able to capture the changes, but need to start the learning process from the beginning. The techniques, which extract only the new knowledge contained in the data and provide the learning model in an incremental way, have the advantage of taking into account only the really useful experiments and speed up the analysis. In this paper, we describe some solutions to these problems, with the support of numerical experiments on the discrimination among differ types of leukemia. © Springer Science+Business Media New York 2012.
2012
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
9781461441328
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/272000
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact