Knowledge distillation (KD) is a key technique for transferring knowledge from a large, complex “teacher” model to a smaller, more efficient “student” model. Although initially developed for model compression, it has found applications across various domains due to the benefits of its knowledge transfer mechanism. While Cross Entropy (CE) and Kullback-Leibler (KL) are commonly used in KD, this work investigates the applicability of loss functions based on underexplored information dissimilarity measures, such as Triangular Divergence (TD), Structural Entropic Distance (SED), and Jensen-Shannon Divergence (JS), for both independent and identically distributed (iid) and non-iid data distributions. The primary contributions of this study include an empirical evaluation of these dissimilarity measures within a decentralized learning context, i.e., where independent clients collaborate without a central server coordinating the learning process. Additionally, the paper assesses the performance of clients by comparing pairwise distillation averaging among clients to conventional peer-to-peer pairwise distillation. Results indicate that while dissimilarity measures perform comparably in iid settings, non-iid distributions favor SED and JS, which also demonstrated consistent performance across clients.
Information dissimilarity measures in decentralized knowledge distillation: a comparative analysis
Vadicamo L.
;Carlini E.
;Gennaro C.
;
2024
Abstract
Knowledge distillation (KD) is a key technique for transferring knowledge from a large, complex “teacher” model to a smaller, more efficient “student” model. Although initially developed for model compression, it has found applications across various domains due to the benefits of its knowledge transfer mechanism. While Cross Entropy (CE) and Kullback-Leibler (KL) are commonly used in KD, this work investigates the applicability of loss functions based on underexplored information dissimilarity measures, such as Triangular Divergence (TD), Structural Entropic Distance (SED), and Jensen-Shannon Divergence (JS), for both independent and identically distributed (iid) and non-iid data distributions. The primary contributions of this study include an empirical evaluation of these dissimilarity measures within a decentralized learning context, i.e., where independent clients collaborate without a central server coordinating the learning process. Additionally, the paper assesses the performance of clients by comparing pairwise distillation averaging among clients to conventional peer-to-peer pairwise distillation. Results indicate that while dissimilarity measures perform comparably in iid settings, non-iid distributions favor SED and JS, which also demonstrated consistent performance across clients.File | Dimensione | Formato | |
---|---|---|---|
2024_SISAP__Information_Dissimilarity_Measures_in_Decentralized_Knowledge_Distillation.pdf
accesso aperto
Descrizione: This is the Submitted version (preprint) of the following paper: Molo M.J. et al. “Information Dissimilarity Measures in Decentralized Knowledge Distillation: A Comparative Analysis”, 2024 submitted to “Similarity Search and Applications. 17th International Conference, SISAP 2024, Providence, RI, USA, November 4–6, 2024, Proceedings”. The final published version is available on the publisher’s website https://link.springer.com/chapter/10.1007/978-3-031-75823-2_12.
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
1.32 MB
Formato
Adobe PDF
|
1.32 MB | Adobe PDF | Visualizza/Apri |
978-3-031-75823-2_12.pdf
solo utenti autorizzati
Descrizione: Information Dissimilarity Measures in Decentralized Knowledge Distillation: A Comparative Analysis
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
2.3 MB
Formato
Adobe PDF
|
2.3 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.