CNR Institutional Research Information System

Advanced statistical and machine-learning tools are extremely powerful but are based on the so called i.i.d. assumptions (that the data are sampled randomly from an identical distribution function). The properties of these techniques are therefore guaranteed only if these assumptions are verified. Since they imply that the data in the training set, the test set and the final application have to be sampled from the same probability distribution function, their use in many scientific domains is delicate. One of the most problematic applications is in the design of new experiments or machines, whose main objective consists precisely of exploring uncharted regions of the parameter space to acquire new knowledge. In the contribution, a completely original method to support the scientist in the design of new experiments is proposed, which is based on the falsification of data driven models. The technique relies on symbol manipulation with evolutionary programmes. The performance of the developed approach has been extensively tested with synthetic data, proving its potential and competitive advantages. The capability of the methodology, to handle practical and experimental cases, has been shown with the example of determining scaling laws of the energy confinement time in Tokamaks, a typical task violating the assumptions of stationarity. The same technique can be adopted also to investigate large databases or the outputs of complex simulations, to focus the analysis efforts on the most promising entries.

A New Approach to Experimental Design Based on Learning in Non-Stationary Conditions

Murari A;Gelfusa M;Lungaroni M;peluso E;Gaudio P;JET Contributors

2019

Abstract

Advanced statistical and machine-learning tools are extremely powerful but are based on the so called i.i.d. assumptions (that the data are sampled randomly from an identical distribution function). The properties of these techniques are therefore guaranteed only if these assumptions are verified. Since they imply that the data in the training set, the test set and the final application have to be sampled from the same probability distribution function, their use in many scientific domains is delicate. One of the most problematic applications is in the design of new experiments or machines, whose main objective consists precisely of exploring uncharted regions of the parameter space to acquire new knowledge. In the contribution, a completely original method to support the scientist in the design of new experiments is proposed, which is based on the falsification of data driven models. The technique relies on symbol manipulation with evolutionary programmes. The performance of the developed approach has been extensively tested with synthetic data, proving its potential and competitive advantages. The capability of the methodology, to handle practical and experimental cases, has been shown with the example of determining scaling laws of the energy confinement time in Tokamaks, a typical task violating the assumptions of stationarity. The same technique can be adopted also to investigate large databases or the outputs of complex simulations, to focus the analysis efforts on the most promising entries.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Strutture organizzative
	
				Istituto gas ionizzati  - IGI - Sede Padova
Istituto per la Scienza e Tecnologia dei Plasmi - ISTP
			
	Parole chiave
	
				Tokamak
statistical-learning tools
machine-learning tools
non-stationary conditions
			
	Appare nelle tipologie:
	
				04.04 Presentazione/Comunicazione non pubblicata (convegno, evento, webinar...)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/365576

Citazioni

ND

ND

ND

social impact