Discretization is a fundamental phase for many classification algorithms: it aims at finding a proper set of cutoffs that subdivide a continuous domain into homogeneous intervals; the points in each interval should have a high probability of belonging to the same class. This paper proposes two different approaches for discretization: the first one consists in retrieving the optimal set of separation points through the solution of a proper linear programming problem. Since the optimal solution may require an excessive computational burden, an alternative technique, based on the iterative addition of separation points, is described. The greedy algorithm is evaluated on some artificial datasets and compared with other well-known discretization techniques such as EntMDL. The results of the simulations show the good performances of the novel algorithm in terms both of accuracy of the solution and of computational effort required for its generation.

Maximizing pattern separation in discretizing continuous features for classification purposes

M Muselli
2010

Abstract

Discretization is a fundamental phase for many classification algorithms: it aims at finding a proper set of cutoffs that subdivide a continuous domain into homogeneous intervals; the points in each interval should have a high probability of belonging to the same class. This paper proposes two different approaches for discretization: the first one consists in retrieving the optimal set of separation points through the solution of a proper linear programming problem. Since the optimal solution may require an excessive computational burden, an alternative technique, based on the iterative addition of separation points, is described. The greedy algorithm is evaluated on some artificial datasets and compared with other well-known discretization techniques such as EntMDL. The results of the simulations show the good performances of the novel algorithm in terms both of accuracy of the solution and of computational effort required for its generation.
2010
Istituto di Elettronica e di Ingegneria dell'Informazione e delle Telecomunicazioni - IEIIT
978-1-4244-6917-8
machine learning
discretization
classification problem
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/56452
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact