Advanced statistical and machine-learning tools are extremely powerful but are based on the so called i.i.d. assumptions (that the data are sampled randomly from an identical distribution function). The properties of these techniques are therefore guaranteed only if these assumptions are verified. Since they imply that the data in the training set, the test set and the final application have to be sampled from the same probability distribution function, their use in many scientific domains is delicate. One of the most problematic applications is in the design of new experiments or machines, whose main objective consists precisely of exploring uncharted regions of the parameter space to acquire new knowledge. In the contribution, a completely original method to support the scientist in the design of new experiments is proposed, which is based on the falsification of data driven models. The technique relies on symbol manipulation with evolutionary programmes. The performance of the developed approach has been extensively tested with synthetic data, proving its potential and competitive advantages. The capability of the methodology, to handle practical and experimental cases, has been shown with the example of determining scaling laws of the energy confinement time in Tokamaks, a typical task violating the assumptions of stationarity. The same technique can be adopted also to investigate large databases or the outputs of complex simulations, to focus the analysis efforts on the most promising entries.

A New Approach to Experimental Design Based on Learning in Non-Stationary Conditions

Murari A;
2019

Abstract

Advanced statistical and machine-learning tools are extremely powerful but are based on the so called i.i.d. assumptions (that the data are sampled randomly from an identical distribution function). The properties of these techniques are therefore guaranteed only if these assumptions are verified. Since they imply that the data in the training set, the test set and the final application have to be sampled from the same probability distribution function, their use in many scientific domains is delicate. One of the most problematic applications is in the design of new experiments or machines, whose main objective consists precisely of exploring uncharted regions of the parameter space to acquire new knowledge. In the contribution, a completely original method to support the scientist in the design of new experiments is proposed, which is based on the falsification of data driven models. The technique relies on symbol manipulation with evolutionary programmes. The performance of the developed approach has been extensively tested with synthetic data, proving its potential and competitive advantages. The capability of the methodology, to handle practical and experimental cases, has been shown with the example of determining scaling laws of the energy confinement time in Tokamaks, a typical task violating the assumptions of stationarity. The same technique can be adopted also to investigate large databases or the outputs of complex simulations, to focus the analysis efforts on the most promising entries.
2019
Istituto gas ionizzati - IGI - Sede Padova
Istituto per la Scienza e Tecnologia dei Plasmi - ISTP
Tokamak
statistical-learning tools
machine-learning tools
non-stationary conditions
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/365576
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact