This paper introduces a new method that improves the generalization ability of genetic programming (GP) for symbolic regression problems, named variance-based layered learning GP. In this approach, several datasets, called primitive training sets, are derived from the original training data. They are generated from less complex to more complex, for a suitable complexity measure. The last primitive dataset is still less complex than the original training set. The approach decomposes the evolution process into several hierarchical layers. The first layer of the evolution starts using the least complex (smoothest) primitive training set. In the next layers, more complex primitive sets are given to the GP engine. Finally, the original training data is given to the algorithm. We use the variance of the output values of a function as a measure of the functional complexity. This measure is utilized in order to generate smoother training data, and controlling the functional complexity of the solutions to reduce the overfitting. The experiments, conducted on four real-world and three artificial symbolic regression problems, demonstrate that the approach enhances the generalization ability of the GP, and reduces the complexity of the obtained solutions. © 2014 Springer Science+Business Media New York.

Improving GP generalization: a variance-based layered learning approach

Folino Gianluigi
2015

Abstract

This paper introduces a new method that improves the generalization ability of genetic programming (GP) for symbolic regression problems, named variance-based layered learning GP. In this approach, several datasets, called primitive training sets, are derived from the original training data. They are generated from less complex to more complex, for a suitable complexity measure. The last primitive dataset is still less complex than the original training set. The approach decomposes the evolution process into several hierarchical layers. The first layer of the evolution starts using the least complex (smoothest) primitive training set. In the next layers, more complex primitive sets are given to the GP engine. Finally, the original training data is given to the algorithm. We use the variance of the output values of a function as a measure of the functional complexity. This measure is utilized in order to generate smoother training data, and controlling the functional complexity of the solutions to reduce the overfitting. The experiments, conducted on four real-world and three artificial symbolic regression problems, demonstrate that the approach enhances the generalization ability of the GP, and reduces the complexity of the obtained solutions. © 2014 Springer Science+Business Media New York.
2015
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Generalization
Genetic programming
Layered learning
Overfitting
Variance
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/261103
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? ND
social impact