In the last decades, advanced statistical and machine-learning tools have made enormous progress and they find applications in many fields. On the other hand, their penetration in the scientific domain is delayed by various factors, among which one fundamental limitation is that they assume stationary conditions. This is due to the fact that traditional machine learning tools guarantee their results only if the data in the training set, the test set and the final application are sampled from the same probability distribution function. On the contrary, in most scientific applications, the main objective of new experiments consists precisely of exploring uncharted regions of the parameter space to acquire new knowledge. Traditional methods of covariate shift to address this issue are clearly insufficient. In this paper, a completely new method is proposed, which is based on the falsification of data driven models. The technique is based on symbol manipulation with evolutionary programmes. The performance of the approach has been extensively tested numerically, proving its competitive advantages. The capability of the methodology, to handle practical and experimental cases, has been shown with the example of determining scaling laws for the design of new experiments, a typical issue violating the assumptions of stationarity. The same methodology can be adopted also to investigate large databases or the outputs of complex simulations, to focus the analysis efforts on the most promising entries.
A New Approach to the Planning of New Experiments based on Learning in Non-Stationary Conditions
Murari A;
2019
Abstract
In the last decades, advanced statistical and machine-learning tools have made enormous progress and they find applications in many fields. On the other hand, their penetration in the scientific domain is delayed by various factors, among which one fundamental limitation is that they assume stationary conditions. This is due to the fact that traditional machine learning tools guarantee their results only if the data in the training set, the test set and the final application are sampled from the same probability distribution function. On the contrary, in most scientific applications, the main objective of new experiments consists precisely of exploring uncharted regions of the parameter space to acquire new knowledge. Traditional methods of covariate shift to address this issue are clearly insufficient. In this paper, a completely new method is proposed, which is based on the falsification of data driven models. The technique is based on symbol manipulation with evolutionary programmes. The performance of the approach has been extensively tested numerically, proving its competitive advantages. The capability of the methodology, to handle practical and experimental cases, has been shown with the example of determining scaling laws for the design of new experiments, a typical issue violating the assumptions of stationarity. The same methodology can be adopted also to investigate large databases or the outputs of complex simulations, to focus the analysis efforts on the most promising entries.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.