Classification is a relevant task in the cyber security domain, but it must be able to cope with unbalanced and/or incomplete datasets and must also react in real-time to changes in the data. Ensemble of clas- sifiers are a useful tool for classification in hard domains as they combine different classifiers that together provide complementary information. However, most of the ensemble-based algorithms require an extensive training phase and need to be re-trained in case of changes in the data. This work proposes a Genetic Programming-based framework to gen- erate a function for combining an ensemble, having some interesting properties: the models composing the ensemble are trained only on a portion of the training set, and then, they can be combined and used without any extra phase of training; furthermore, in case of changes in the data, the function can be recomputed in an incrementally way, with a moderate computational effort. Experiments conducted on unbalanced datasets and on a well-known cyber-security dataset assess the goodness of the approach.
Combining Ensemble of Classifiers by Using Genetic Programming for Cyber Security Applications
Gianluigi Folino;Francesco Sergio Pisani
2015
Abstract
Classification is a relevant task in the cyber security domain, but it must be able to cope with unbalanced and/or incomplete datasets and must also react in real-time to changes in the data. Ensemble of clas- sifiers are a useful tool for classification in hard domains as they combine different classifiers that together provide complementary information. However, most of the ensemble-based algorithms require an extensive training phase and need to be re-trained in case of changes in the data. This work proposes a Genetic Programming-based framework to gen- erate a function for combining an ensemble, having some interesting properties: the models composing the ensemble are trained only on a portion of the training set, and then, they can be combined and used without any extra phase of training; furthermore, in case of changes in the data, the function can be recomputed in an incrementally way, with a moderate computational effort. Experiments conducted on unbalanced datasets and on a well-known cyber-security dataset assess the goodness of the approach.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


