Advanced statistical and machine-learning tools are extremely powerful but are based on the so called i.i.d. assumptions (that the data are sampled randomly from an identical distribution function). The properties of these techniques are therefore guaranteed only if these assumptions are verified. Since they imply that the data in the training set, the test set and the final application have to be sampled from the same probability distribution function, their use in many scientific domains is delicate. One of the most problematic applications is in the design of new experiments or machines, whose main objective consists precisely of exploring uncharted regions of the parameter space to acquire new knowledge. In the contribution, a completely original method to support the scientist in the design of new experiments is proposed, which is based on the falsification of data driven models. The technique relies on symbol manipulation with evolutionary programmes. The performance of the developed approach has been extensively tested with synthetic data, proving its potential and competitive advantages. The capability of the methodology, to handle practical and experimental cases, has been shown with the example of determining scaling laws of the energy confinement time in Tokamaks, a typical task violating the assumptions of stationarity. The same technique can be adopted also to investigate large databases or the outputs of complex simulations, to focus the analysis efforts on the most promising entries.
A New Approach to Experimental Design Based on Learning in Non-Stationary Conditions
Murari A;
2019
Abstract
Advanced statistical and machine-learning tools are extremely powerful but are based on the so called i.i.d. assumptions (that the data are sampled randomly from an identical distribution function). The properties of these techniques are therefore guaranteed only if these assumptions are verified. Since they imply that the data in the training set, the test set and the final application have to be sampled from the same probability distribution function, their use in many scientific domains is delicate. One of the most problematic applications is in the design of new experiments or machines, whose main objective consists precisely of exploring uncharted regions of the parameter space to acquire new knowledge. In the contribution, a completely original method to support the scientist in the design of new experiments is proposed, which is based on the falsification of data driven models. The technique relies on symbol manipulation with evolutionary programmes. The performance of the developed approach has been extensively tested with synthetic data, proving its potential and competitive advantages. The capability of the methodology, to handle practical and experimental cases, has been shown with the example of determining scaling laws of the energy confinement time in Tokamaks, a typical task violating the assumptions of stationarity. The same technique can be adopted also to investigate large databases or the outputs of complex simulations, to focus the analysis efforts on the most promising entries.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


