An extension of Cellular Genetic Programming for data classification (CGPC) to induce an ensemble of predictors is presented. Two algorithms implementing the bagging and boosting techniques are described and compared with CGPC. The approach is able to deal with large data sets that do not fit in main memory since each classifier is trained on a subset of the overall training data. The predictors are then combined to classify new tuples. Experiments on several data sets show that, by using a training set of reduced size, better classification accuracy can be obtained, but at a much lower computational cost.

GP Ensembles for Large Scale Data Classification

Gianluigi Folino;Clara Pizzuti;Giandomenico Spezzano
2006

Abstract

An extension of Cellular Genetic Programming for data classification (CGPC) to induce an ensemble of predictors is presented. Two algorithms implementing the bagging and boosting techniques are described and compared with CGPC. The approach is able to deal with large data sets that do not fit in main memory since each classifier is trained on a subset of the overall training data. The predictors are then combined to classify new tuples. Experiments on several data sets show that, by using a training set of reduced size, better classification accuracy can be obtained, but at a much lower computational cost.
2006
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
data mining
genetic programming
classification
bagging
boosting
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/13285
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 44
  • ???jsp.display-item.citation.isi??? ND
social impact