Decentralized learning is emerging as a scalable and privacy-preserving alternative to centralized machine learning, particularly in distributed systems where data cannot be centrally shared among multiple nodes or clients. While Federated Learning is widely adopted in this context, Knowledge Distillation (KD) is emerging as a flexible and scalable alternative where model output is used to share knowledge among distributed clients. However, existing studies often overlook the efficiency and effectiveness of various knowledge transfer strategies in KD, especially in decentralized environments where data is non-IID. This study provides key insights by examining the impact of network topology and distillation strategies in KD-based decentralized learning approaches. Our evaluation spans several dissimilarity measures, including Cross-Entropy, Kullback-Leibler divergence, Triangular Divergence, Jensen-Shannon divergence, Structural Entropic Distance, and Multi-way SED, assessed under both pairwise and holistic distillation schemes. In the pairwise approach, distillation is performed by summing the client-wise dissimilarities between a client's output and each neighbor's prediction individually, while the holistic approach computes dissimilarity with respect to the average of the output predictions received from neighboring clients. We also analyze performance across client connectivity levels to explore the trade-off between convergence speed and model accuracy. The results indicate that the holistic distillation approach, which averages client predictions, outperforms the sum of pairwise distillation, especially when employing alternative measures like TD, SED, and JS. These measures offer improved performance over conventional metrics such as CE and KL divergence.
Decentralized edge learning: a comparative study of distillation strategies and dissimilarity measures
Vadicamo L.;Gennaro C.;Carlini E.
2026
Abstract
Decentralized learning is emerging as a scalable and privacy-preserving alternative to centralized machine learning, particularly in distributed systems where data cannot be centrally shared among multiple nodes or clients. While Federated Learning is widely adopted in this context, Knowledge Distillation (KD) is emerging as a flexible and scalable alternative where model output is used to share knowledge among distributed clients. However, existing studies often overlook the efficiency and effectiveness of various knowledge transfer strategies in KD, especially in decentralized environments where data is non-IID. This study provides key insights by examining the impact of network topology and distillation strategies in KD-based decentralized learning approaches. Our evaluation spans several dissimilarity measures, including Cross-Entropy, Kullback-Leibler divergence, Triangular Divergence, Jensen-Shannon divergence, Structural Entropic Distance, and Multi-way SED, assessed under both pairwise and holistic distillation schemes. In the pairwise approach, distillation is performed by summing the client-wise dissimilarities between a client's output and each neighbor's prediction individually, while the holistic approach computes dissimilarity with respect to the average of the output predictions received from neighboring clients. We also analyze performance across client connectivity levels to explore the trade-off between convergence speed and model accuracy. The results indicate that the holistic distillation approach, which averages client predictions, outperforms the sum of pairwise distillation, especially when employing alternative measures like TD, SED, and JS. These measures offer improved performance over conventional metrics such as CE and KL divergence.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_Decentralized_Edge_Learning.pdf
accesso aperto
Descrizione: Post-print
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
3.8 MB
Formato
Adobe PDF
|
3.8 MB | Adobe PDF | Visualizza/Apri |
|
Molo et al_DecentralizedEdgeLearning_2026.pdf
solo utenti autorizzati
Descrizione: Decentralized edge learning: A comparative study of distillation strategies and dissimilarity measures
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
10.2 MB
Formato
Adobe PDF
|
10.2 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


