A Federated K-Means-Based Approach in eHealth Domains with Heterogeneous Data Distributions

Paragliola, G.; Ribino, P.; Mannone, M.

doi:10.5220/0012981200003837

Healthcare organizations collect and store significant amounts of patient health information. However, sharing or accessing this information outside of their facilities is often hindered by factors such as privacy concerns. Federated Learning (FL) data systems are emerging to overcome the siloed nature of health data and the barriers to sharing it. While federated approaches have been extensively studied, especially in classification problems, clustering-oriented approaches are still relatively few and less widespread, both in formulating algorithms and in their application in eHealth domains. The primary objective of this paper is to introduce a federated K-means-based approach for clustering tasks within the healthcare domain and explore the impact of heterogeneous health data distributions. The evaluation of the proposed federated K-means approach has been conducted on several health-related datasets through comparison with the centralized version and by estimating the trade-off between privacy and performance. The preliminary findings suggest that in the case of heterogeneous health data distributions, the difference between the centralized and federated approach is marginal, with the federated approach outperforming the centralized one on some healthcare datasets.