Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data

Perera-Lago, Javier; Toscano-Duran, Victor; Paluzo-Hidalgo, Eduardo; Narteni, Sara; Rucco, Matteo

doi:10.1007/978-3-031-63803-9_21

Machine learning algorithms are fundamental components of novel data-informed Artificial Intelligence architecture. In this domain, the imperative role of representative datasets is a cornerstone in shaping the trajectory of artificial intelligence (AI) development. Representative datasets are needed to train machine learning components properly. Proper training has multiple impacts: it reduces the final model’s complexity, power, and uncertainties. In this paper, we investigate the reliability of the -representativeness method to assess the dataset similarity from a theoretical perspective for decision trees. We decided to focus on the family of decision trees because it includes a wide variety of models known to be explainable. Thus, in this paper, we provide a result guaranteeing that if two datasets are related by -representativeness, i.e., both of them have points closer than , then the predictions by the classic decision tree are similar. Experimentally, we have also tested that -representativeness presents a significant correlation with the ordering of the feature importance. Moreover, we extend the results experimentally in the context of unseen vehicle collision data for XGboost, a machine learning component widely adopted for dealing with tabular data.

Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data

Javier Perera-Lago;Victor Toscano-Duran;Eduardo Paluzo-Hidalgo;Sara Narteni^Penultimo;Matteo Rucco

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Strutture organizzative
	
				Istituto di Elettronica e di Ingegneria dell'Informazione e delle Telecomunicazioni - IEIIT
			
	Parole chiave
	
				Decision trees, XGboost, Representativeness, Feature importance

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/491502

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

CNR Institutional Research Information System

Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data

Javier Perera-Lago;Victor Toscano-Duran;Eduardo Paluzo-Hidalgo;Sara Narteni^Penultimo;Matteo Rucco

Penultimo

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Attenzione

Citazioni

social impact

CNR Institutional Research Information System

Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data

Javier Perera-Lago;Victor Toscano-Duran;Eduardo Paluzo-Hidalgo;Sara NarteniPenultimo;Matteo Rucco

Penultimo

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Javier Perera-Lago;Victor Toscano-Duran;Eduardo Paluzo-Hidalgo;Sara Narteni^Penultimo;Matteo Rucco

Scheda breve

Scheda completa

Scheda completa (DC)