CNR Institutional Research Information System

The increasing use of black-box networks in high-risk contexts has led researchers to propose explainable methods to make these networks transparent. Most methods that allow us to understand the behavior of Deep Neural Networks (DNNs) are post-hoc approaches, implying that the explainability is questionable, as these methods do not clarify the internal behavior of a model. Thus, this demonstrates the difficulty of interpreting the internal behavior of deep models. This systematic literature review collects the ante-hoc methods that provide an understanding of the internal mechanisms of deep models and which can be helpful to researchers who need to use interpretability methods to clarify DNNs. This work provides the definitions of strong interpretability and weak interpretability, which will be used to describe the interpretability of the methods discussed in this article. The results of this work are divided mainly into prototype-based methods, concept-based methods, and other interpretability methods for deep models.

Ante-Hoc Methods for Interpretable Deep Models: A Survey

Antonio Di Marino;Vincenzo Bevilacqua;Angelo Ciaramella;Ivanoe De Falco;Giovanna Sannino

2025

Abstract

The increasing use of black-box networks in high-risk contexts has led researchers to propose explainable methods to make these networks transparent. Most methods that allow us to understand the behavior of Deep Neural Networks (DNNs) are post-hoc approaches, implying that the explainability is questionable, as these methods do not clarify the internal behavior of a model. Thus, this demonstrates the difficulty of interpreting the internal behavior of deep models. This systematic literature review collects the ante-hoc methods that provide an understanding of the internal mechanisms of deep models and which can be helpful to researchers who need to use interpretability methods to clarify DNNs. This work provides the definitions of strong interpretability and weak interpretability, which will be used to describe the interpretability of the methods discussed in this article. The results of this work are divided mainly into prototype-based methods, concept-based methods, and other interpretability methods for deep models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR - Sede Secondaria Napoli
			
	Parole chiave
	
				Artificial Intelligence, Interpretability
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
3728637.pdf accesso aperto Descrizione: ACM_survey_interpretability Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 2.72 MB Formato Adobe PDF Visualizza/Apri	2.72 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/544054

Citazioni

ND

ND

ND

social impact