Recent years have witnessed the rise of accurate but obscure classification models that hide the logic of their internal decision processes. In this paper, we present a framework to locally explain any type of black-box classifiers working on any data type through a rule-based model. In the literature already exists local explanation approaches able to accomplish this task. However, they suffer from a significant limitation that implies representing data as a binary vectors and constraining the local surrogate model to be trained on synthetic instances that are not representative of the real world. We overcome these deficiencies by using autoencoder-based approaches. The proposed framework first allows to generate synthetic instances in the latent feature space and learn a latent decision tree classifier. After that, it selects and decodes the synthetic instances respecting local decision rules. Independently from the data type under analysis, such synthetic instances belonging to different classes can unveil the reasons for the classification. Also, depending on the data type, they can be exploited to provide the most useful kind of explanation. Experiments show that the proposed framework advances the state-of-the-art towards a comprehensive and widely usable approach that is able to successfully guarantee various properties besides interpretability.

Exploiting auto-encoders for explaining black-box classifiers

Guidotti;Riccardo
2022

Abstract

Recent years have witnessed the rise of accurate but obscure classification models that hide the logic of their internal decision processes. In this paper, we present a framework to locally explain any type of black-box classifiers working on any data type through a rule-based model. In the literature already exists local explanation approaches able to accomplish this task. However, they suffer from a significant limitation that implies representing data as a binary vectors and constraining the local surrogate model to be trained on synthetic instances that are not representative of the real world. We overcome these deficiencies by using autoencoder-based approaches. The proposed framework first allows to generate synthetic instances in the latent feature space and learn a latent decision tree classifier. After that, it selects and decodes the synthetic instances respecting local decision rules. Independently from the data type under analysis, such synthetic instances belonging to different classes can unveil the reasons for the classification. Also, depending on the data type, they can be exploited to provide the most useful kind of explanation. Experiments show that the proposed framework advances the state-of-the-art towards a comprehensive and widely usable approach that is able to successfully guarantee various properties besides interpretability.
2022
Explainable artificial intelligence
auto-encoders
interpretable machine learning
model-agnostic explainer
data-agnostic explainer
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/457328
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact