Classification is a relevant task in the cyber security domain, but it must be able to cope with unbalanced and/or incomplete datasets and must also react in real-time to changes in the data. Ensemble of clas- sifiers are a useful tool for classification in hard domains as they combine different classifiers that together provide complementary information. However, most of the ensemble-based algorithms require an extensive training phase and need to be re-trained in case of changes in the data. This work proposes a Genetic Programming-based framework to gen- erate a function for combining an ensemble, having some interesting properties: the models composing the ensemble are trained only on a portion of the training set, and then, they can be combined and used without any extra phase of training; furthermore, in case of changes in the data, the function can be recomputed in an incrementally way, with a moderate computational effort. Experiments conducted on unbalanced datasets and on a well-known cyber-security dataset assess the goodness of the approach.

Combining Ensemble of Classifiers by Using Genetic Programming for Cyber Security Applications

Gianluigi Folino;Francesco Sergio Pisani
2015

Abstract

Classification is a relevant task in the cyber security domain, but it must be able to cope with unbalanced and/or incomplete datasets and must also react in real-time to changes in the data. Ensemble of clas- sifiers are a useful tool for classification in hard domains as they combine different classifiers that together provide complementary information. However, most of the ensemble-based algorithms require an extensive training phase and need to be re-trained in case of changes in the data. This work proposes a Genetic Programming-based framework to gen- erate a function for combining an ensemble, having some interesting properties: the models composing the ensemble are trained only on a portion of the training set, and then, they can be combined and used without any extra phase of training; furthermore, in case of changes in the data, the function can be recomputed in an incrementally way, with a moderate computational effort. Experiments conducted on unbalanced datasets and on a well-known cyber-security dataset assess the goodness of the approach.
2015
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
978-3-319-16548-6
data mining
intrusion detection
genetic programming
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/303824
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact