CNR Institutional Research Information System

Synthetic data generation has been widely adopted in different fields such as software testing, data privacy, imbalanced learning, machine learning explanation, etc. In such contexts, it can be important to generate data samples located within "local" areas surrounding specific instances. Indeed, local synthetic data can help the learning phase of predictive models, and it is fundamental for methods explaining the local decisionmaking behavior of obscure classifiers. In explainable machine learning, each local explainer either introduces an ad-hoc procedure for neighborhood generation designed for a particular type of data, or uses a general-purpose approach having different effects on different data types. The contribution of this paper is twofold. First, we introduce a method based on generative operators allowing the synthetic neighborhood generation by applying specific perturbations on a given input instance. The key factor of the proposed method consists in performing a data transformation that makes agnostic the data generation, i.e., applicable to any type of data. Second, we design a framework for evaluating the goodness of local synthetic neighborhoods exploiting both supervised and unsupervised methodologies. A deep experimentation on a wide range of datasets of different types shows the effectiveness of the proposal in generating realistic neighborhoods which are also compact and dense.

Data-Agnostic Local Neighborhood Generation

Guidotti;Riccardo;Monreale;Anna

2020

Abstract

Synthetic data generation has been widely adopted in different fields such as software testing, data privacy, imbalanced learning, machine learning explanation, etc. In such contexts, it can be important to generate data samples located within "local" areas surrounding specific instances. Indeed, local synthetic data can help the learning phase of predictive models, and it is fundamental for methods explaining the local decisionmaking behavior of obscure classifiers. In explainable machine learning, each local explainer either introduces an ad-hoc procedure for neighborhood generation designed for a particular type of data, or uses a general-purpose approach having different effects on different data types. The contribution of this paper is twofold. First, we introduce a method based on generative operators allowing the synthetic neighborhood generation by applying specific perturbations on a given input instance. The key factor of the proposed method consists in performing a data transformation that makes agnostic the data generation, i.e., applicable to any type of data. Second, we design a framework for evaluating the goodness of local synthetic neighborhoods exploiting both supervised and unsupervised methodologies. A deep experimentation on a wide range of datasets of different types shows the effectiveness of the proposal in generating realistic neighborhoods which are also compact and dense.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Parole chiave
	
				Synthetic Neighborhood Generation
Explainable Machine Lear
Data-Agnostic Generator
Data Mining
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/424644

Citazioni

ND

ND

ND

social impact