Unbalanced classes, the ability to detect changes in real-time, the speed of the streams and other peculiar characteristics make most of the data mining algorithms not apt to operate with datasets in the cyber security domain. To overcome these issues, we propose an ensemble-based algorithm, using a distributed Genetic Program- ming framework to generate the function to combine the classifiers and efficient strategies to react to changes in data. After that the base classifiers are trained, the combining function of the ensemble, based on non-trainable functions, can be generated without any extra phase of training, while the drift detection function adopted, together with a strategy for replacing classifiers, permits to respond in an efficient way to changes. Preliminary experiments conducted on an artificial dataset and on a real intrusion detection dataset show the effectiveness of the approach.
An incremental ensemble evolved by using genetic programming to efficiently detect drifts in cyber security datasets
Gianluigi Folino;Pietro Sabatino;Francesco Sergio Pisani
2016
Abstract
Unbalanced classes, the ability to detect changes in real-time, the speed of the streams and other peculiar characteristics make most of the data mining algorithms not apt to operate with datasets in the cyber security domain. To overcome these issues, we propose an ensemble-based algorithm, using a distributed Genetic Program- ming framework to generate the function to combine the classifiers and efficient strategies to react to changes in data. After that the base classifiers are trained, the combining function of the ensemble, based on non-trainable functions, can be generated without any extra phase of training, while the drift detection function adopted, together with a strategy for replacing classifiers, permits to respond in an efficient way to changes. Preliminary experiments conducted on an artificial dataset and on a real intrusion detection dataset show the effectiveness of the approach.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.