Evolutionary algorithms, i.e., Genetic Programming (GP), have been successfully used for the task of classification, mainly because they are less likely to get stuck in the local optimum, can operate on chunks of data and allow to compute more solutions in parallel. Ensemble techniques are usually more accurate than component learners constituting the ensemble and can be built in an incremental way, improving flexibility, adapting to changes and maintaining part of the history present in the data. This paper proposes a framework based on a distributed GP ensemble algorithm for coping with different types of concept drift for the task of classification of large data streams. The framework is able to detect changes in a very efficient way using only a detection function based on the fractal dimension, which can also works on new incoming unclassified data. Thus, a distributed GP algorithm is performed only when a change is detected in order to improve classification accuracy and this, together with the exploitation of an adaptive procedure, permits to answer in short time to these changes. Experiments are conducted on a real and on some artificial datasets in order to assess the capacity of the framework to detect the drift and quickly respond to it.

Exploiting fractal dimension and a distributed evolutionary approach to classify data streams with concept drifts

Gianluigi Folino;Massimo Guarascio;Giuseppe Papuzzo
2019

Abstract

Evolutionary algorithms, i.e., Genetic Programming (GP), have been successfully used for the task of classification, mainly because they are less likely to get stuck in the local optimum, can operate on chunks of data and allow to compute more solutions in parallel. Ensemble techniques are usually more accurate than component learners constituting the ensemble and can be built in an incremental way, improving flexibility, adapting to changes and maintaining part of the history present in the data. This paper proposes a framework based on a distributed GP ensemble algorithm for coping with different types of concept drift for the task of classification of large data streams. The framework is able to detect changes in a very efficient way using only a detection function based on the fractal dimension, which can also works on new incoming unclassified data. Thus, a distributed GP algorithm is performed only when a change is detected in order to improve classification accuracy and this, together with the exploitation of an adaptive procedure, permits to answer in short time to these changes. Experiments are conducted on a real and on some artificial datasets in order to assess the capacity of the framework to detect the drift and quickly respond to it.
2019
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
ensemble
genetic programming
data streams
concept drift
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/387590
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? ND
social impact