In this paper, a new genetic programming (GP) algorithm for symbolic regression problems is proposed. The algorithm, named statistical genetic programming (SGP), uses statistical information--such as variance, mean and correlation coefficient--to improve GP. To this end, we define well-structured trees as a tree with the following property: nodes which are closer to the root have a higher correlation with the target. It is shown experimentally that on average, the trees with structures closer to well-structured trees are smaller than other trees. SGP biases the search process to find solutions whose structures are closer to a well-structured tree. For this purpose, it extends the terminal set by some small well-structured subtrees, and starts the search process in a search space that is limited to semi-well-structured trees (i.e., trees with at least one well-structured subtree). Moreover, SGP incorporates new genetic operators, i.e., correlation-based mutation and correlation-based crossover, which use the correlation between outputs of each subtree and the targets, to improve the functionality. Furthermore, we suggest a variance-based editing operator which reduces the size of the trees. SGP uses the new operators to explore the search space in a way that it obtains more accurate and smaller solutions in less time. SGP is tested on several symbolic regression benchmarks. The results show that it increases the evolution rate, the accuracy of the solutions, and the generalization ability, and decreases the rate of code growth.

Statistical genetic programming for symbolic regression

Folino G
2017

Abstract

In this paper, a new genetic programming (GP) algorithm for symbolic regression problems is proposed. The algorithm, named statistical genetic programming (SGP), uses statistical information--such as variance, mean and correlation coefficient--to improve GP. To this end, we define well-structured trees as a tree with the following property: nodes which are closer to the root have a higher correlation with the target. It is shown experimentally that on average, the trees with structures closer to well-structured trees are smaller than other trees. SGP biases the search process to find solutions whose structures are closer to a well-structured tree. For this purpose, it extends the terminal set by some small well-structured subtrees, and starts the search process in a search space that is limited to semi-well-structured trees (i.e., trees with at least one well-structured subtree). Moreover, SGP incorporates new genetic operators, i.e., correlation-based mutation and correlation-based crossover, which use the correlation between outputs of each subtree and the targets, to improve the functionality. Furthermore, we suggest a variance-based editing operator which reduces the size of the trees. SGP uses the new operators to explore the search space in a way that it obtains more accurate and smaller solutions in less time. SGP is tested on several symbolic regression benchmarks. The results show that it increases the evolution rate, the accuracy of the solutions, and the generalization ability, and decreases the rate of code growth.
2017
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Correlation coefficient
Genetic programming
Semi-well-structured tree
Symbolic regression
Well-structured subtree
Well-structuredness measure
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/336701
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 58
  • ???jsp.display-item.citation.isi??? ND
social impact