The demand for understanding machine learning models has led to the development of interpretable-by-design models that provide both outcomes and explanations. In this paper, we extend the concept of Prototypical Part Networks to the audio domain with SonicProtoPNet. This model enables a “this sounds like that” reasoning for audio classification, where a test instance audio is classified based on prototypical parts that most resemble specific areas of specific training instances. Quantitative results from genre and environmental sound classification, as well as musical instrument recognition tasks, demonstrate satisfactory per formance using the Log-Mel transformation of the audio input signal, further supported by backbone pre-training on image-input data. Furthermore, we introduce a high-quality back-soundification method for the learned sonic prototypes, facilitating intuitive interpretation of classification decisions through auditory inspection.

This sounds like that: explainable audio classification via prototypical parts

Guidotti R.;Pedreschi D.
2025

Abstract

The demand for understanding machine learning models has led to the development of interpretable-by-design models that provide both outcomes and explanations. In this paper, we extend the concept of Prototypical Part Networks to the audio domain with SonicProtoPNet. This model enables a “this sounds like that” reasoning for audio classification, where a test instance audio is classified based on prototypical parts that most resemble specific areas of specific training instances. Quantitative results from genre and environmental sound classification, as well as musical instrument recognition tasks, demonstrate satisfactory per formance using the Log-Mel transformation of the audio input signal, further supported by backbone pre-training on image-input data. Furthermore, we introduce a high-quality back-soundification method for the learned sonic prototypes, facilitating intuitive interpretation of classification decisions through auditory inspection.
2025
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
9783031789793
9783031789809
Explainable Artificial Intelligence
Explainable Audio Classification
Part Prototypical Interpretability
File in questo prodotto:
File Dimensione Formato  
Fedele-Guidotti-Pedreschi_Springer 2025.pdf

solo utenti autorizzati

Descrizione: This Sounds Like That: Explainable Audio Classification via Prototypical Parts
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.13 MB
Formato Adobe PDF
1.13 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/549102
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact