Synthetic data generation has been widely adopted in different fields such as software testing, data privacy, imbalanced learning, machine learning explanation, etc. In such contexts, it can be important to generate data samples located within "local" areas surrounding specific instances. Indeed, local synthetic data can help the learning phase of predictive models, and it is fundamental for methods explaining the local decisionmaking behavior of obscure classifiers. In explainable machine learning, each local explainer either introduces an ad-hoc procedure for neighborhood generation designed for a particular type of data, or uses a general-purpose approach having different effects on different data types. The contribution of this paper is twofold. First, we introduce a method based on generative operators allowing the synthetic neighborhood generation by applying specific perturbations on a given input instance. The key factor of the proposed method consists in performing a data transformation that makes agnostic the data generation, i.e., applicable to any type of data. Second, we design a framework for evaluating the goodness of local synthetic neighborhoods exploiting both supervised and unsupervised methodologies. A deep experimentation on a wide range of datasets of different types shows the effectiveness of the proposal in generating realistic neighborhoods which are also compact and dense.

Data-Agnostic Local Neighborhood Generation

Guidotti;Riccardo;Monreale;Anna
2020

Abstract

Synthetic data generation has been widely adopted in different fields such as software testing, data privacy, imbalanced learning, machine learning explanation, etc. In such contexts, it can be important to generate data samples located within "local" areas surrounding specific instances. Indeed, local synthetic data can help the learning phase of predictive models, and it is fundamental for methods explaining the local decisionmaking behavior of obscure classifiers. In explainable machine learning, each local explainer either introduces an ad-hoc procedure for neighborhood generation designed for a particular type of data, or uses a general-purpose approach having different effects on different data types. The contribution of this paper is twofold. First, we introduce a method based on generative operators allowing the synthetic neighborhood generation by applying specific perturbations on a given input instance. The key factor of the proposed method consists in performing a data transformation that makes agnostic the data generation, i.e., applicable to any type of data. Second, we design a framework for evaluating the goodness of local synthetic neighborhoods exploiting both supervised and unsupervised methodologies. A deep experimentation on a wide range of datasets of different types shows the effectiveness of the proposal in generating realistic neighborhoods which are also compact and dense.
2020
Synthetic Neighborhood Generation
Explainable Machine Lear
Data-Agnostic Generator
Data Mining
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/424644
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact