Intrusion detection tools have largely benefitted from the usage of supervised classification methods developed in the field of data mining. However, the data produced by modern system/network logs pose many problems, such as the streaming and non-stationary nature of such data, their volume and velocity, and the presence of imbalanced classes. Classifier ensembles look a valid solution for this scenario, owing to their flexibility and scalability. In particular, data-driven schemes for combining the predictions of multiple classifiers have been shown superior to traditional fixed aggregation criteria (e.g., predictions' averaging and weighted voting). In intrusion detection settings, however, such schemes must be devised in an efficient way, since (part of) the ensemble may need to be re-trained frequently. A novel ensemble-based framework is proposed here for the online intrusion detection, where the ensemble is updated through an incremental stream-oriented learning scheme, correspondingly to the detection of concept drifts. Differently from mainstream ensemble-based approaches in the field, our proposal relies on deriving, though an efficient genetic programming (GP) method, an expressive kind of combiner function defined in terms of (non-trainable) aggregation functions. This approach is supported by a system architecture, which integrates different kinds of functionalities, ranging from the drift detection, to the induction and replacement of base classifiers, up to the distributed computation of GP-based combiners. Experiments on both artificial and real-life datasets confirmed the validity of the approach

A GP-based ensemble classification framework for time-changing streams of intrusion detection data

Gianluigi Folino;Luigi Pontieri
2020

Abstract

Intrusion detection tools have largely benefitted from the usage of supervised classification methods developed in the field of data mining. However, the data produced by modern system/network logs pose many problems, such as the streaming and non-stationary nature of such data, their volume and velocity, and the presence of imbalanced classes. Classifier ensembles look a valid solution for this scenario, owing to their flexibility and scalability. In particular, data-driven schemes for combining the predictions of multiple classifiers have been shown superior to traditional fixed aggregation criteria (e.g., predictions' averaging and weighted voting). In intrusion detection settings, however, such schemes must be devised in an efficient way, since (part of) the ensemble may need to be re-trained frequently. A novel ensemble-based framework is proposed here for the online intrusion detection, where the ensemble is updated through an incremental stream-oriented learning scheme, correspondingly to the detection of concept drifts. Differently from mainstream ensemble-based approaches in the field, our proposal relies on deriving, though an efficient genetic programming (GP) method, an expressive kind of combiner function defined in terms of (non-trainable) aggregation functions. This approach is supported by a system architecture, which integrates different kinds of functionalities, ranging from the drift detection, to the induction and replacement of base classifiers, up to the distributed computation of GP-based combiners. Experiments on both artificial and real-life datasets confirmed the validity of the approach
2020
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Data streams
Cybersecurity
Ensemble learning
Genetic programming
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/381071
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact