Decentralized learning is emerging as a scalable and privacy-preserving alternative to centralized machine learning, particularly in distributed systems where data cannot be centrally shared among multiple nodes or clients. While Federated Learning is widely adopted in this context, Knowledge Distillation (KD) is emerging as a flexible and scalable alternative where model output is used to share knowledge among distributed clients. However, existing studies often overlook the efficiency and effectiveness of various knowledge transfer strategies in KD, especially in decentralized environments where data is non-IID. This study provides key insights by examining the impact of network topology and distillation strategies in KD-based decentralized learning approaches. Our evaluation spans several dissimilarity measures, including Cross-Entropy, Kullback-Leibler divergence, Triangular Divergence, Jensen-Shannon divergence, Structural Entropic Distance, and Multi-way SED, assessed under both pairwise and holistic distillation schemes. In the pairwise approach, distillation is performed by summing the client-wise dissimilarities between a client's output and each neighbor's prediction individually, while the holistic approach computes dissimilarity with respect to the average of the output predictions received from neighboring clients. We also analyze performance across client connectivity levels to explore the trade-off between convergence speed and model accuracy. The results indicate that the holistic distillation approach, which averages client predictions, outperforms the sum of pairwise distillation, especially when employing alternative measures like TD, SED, and JS. These measures offer improved performance over conventional metrics such as CE and KL divergence.

Decentralized edge learning: a comparative study of distillation strategies and dissimilarity measures

Vadicamo L.;Gennaro C.;Carlini E.
2026

Abstract

Decentralized learning is emerging as a scalable and privacy-preserving alternative to centralized machine learning, particularly in distributed systems where data cannot be centrally shared among multiple nodes or clients. While Federated Learning is widely adopted in this context, Knowledge Distillation (KD) is emerging as a flexible and scalable alternative where model output is used to share knowledge among distributed clients. However, existing studies often overlook the efficiency and effectiveness of various knowledge transfer strategies in KD, especially in decentralized environments where data is non-IID. This study provides key insights by examining the impact of network topology and distillation strategies in KD-based decentralized learning approaches. Our evaluation spans several dissimilarity measures, including Cross-Entropy, Kullback-Leibler divergence, Triangular Divergence, Jensen-Shannon divergence, Structural Entropic Distance, and Multi-way SED, assessed under both pairwise and holistic distillation schemes. In the pairwise approach, distillation is performed by summing the client-wise dissimilarities between a client's output and each neighbor's prediction individually, while the holistic approach computes dissimilarity with respect to the average of the output predictions received from neighboring clients. We also analyze performance across client connectivity levels to explore the trade-off between convergence speed and model accuracy. The results indicate that the holistic distillation approach, which averages client predictions, outperforms the sum of pairwise distillation, especially when employing alternative measures like TD, SED, and JS. These measures offer improved performance over conventional metrics such as CE and KL divergence.
2026
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Knowledge distillation
Decentralized learning
Information dissimilarity measure
File in questo prodotto:
File Dimensione Formato  
2025_Decentralized_Edge_Learning.pdf

accesso aperto

Descrizione: Post-print
Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 3.8 MB
Formato Adobe PDF
3.8 MB Adobe PDF Visualizza/Apri
Molo et al_DecentralizedEdgeLearning_2026.pdf

solo utenti autorizzati

Descrizione: Decentralized edge learning: A comparative study of distillation strategies and dissimilarity measures
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 10.2 MB
Formato Adobe PDF
10.2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/560821
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact