Background and Objective: Healthcare institutions produce and retain a considerable amount of health-associated information about patients, thus providing great potential for medical applications based on machine learning that may use such data to address healthcare problems. However, accessing or disseminating patients’ data beyond the confines of the host institution is frequently hampered by multiple factors, such as privacy concerns. Federated Learning (FL) data systems have been posited as a potential solution to the challenges related to the segregated nature of health data and the hurdles in data exchange. However, although FL addresses privacy concerns related to healthcare data, in real settings, the non-stationary and the non-uniform data distribution affect the performance of ML models. This paper aims to evaluate how these issues directly impact the quality of the estimated performance of a federated neural network model in the e-health domain. Methods: This work extends a time series-based classification problem from a traditional centralized scenario into a federated one under the hypothesis of non-stationary and non-uniform data distribution, also called heterogeneous data distribution. Results: The assessment of the proposed method's effectiveness is conducted by measuring Precision, Recall, Accuracy and F1-Score, and it is compared with an FL approach working under the hypothesis of stationary data. The experimental evaluation reports performance results with an accuracy and precision of over 0.91, as in the case of stationary data. Conclusion: The results show that heterogeneous data makes it harder to detect positive samples since the most affected metrics by data distributions are Recall and F1.
Exploring heterogeneous data distribution issues in e-health federated systems
Paragliola G.
Primo
;Ribino P.Secondo
2024
Abstract
Background and Objective: Healthcare institutions produce and retain a considerable amount of health-associated information about patients, thus providing great potential for medical applications based on machine learning that may use such data to address healthcare problems. However, accessing or disseminating patients’ data beyond the confines of the host institution is frequently hampered by multiple factors, such as privacy concerns. Federated Learning (FL) data systems have been posited as a potential solution to the challenges related to the segregated nature of health data and the hurdles in data exchange. However, although FL addresses privacy concerns related to healthcare data, in real settings, the non-stationary and the non-uniform data distribution affect the performance of ML models. This paper aims to evaluate how these issues directly impact the quality of the estimated performance of a federated neural network model in the e-health domain. Methods: This work extends a time series-based classification problem from a traditional centralized scenario into a federated one under the hypothesis of non-stationary and non-uniform data distribution, also called heterogeneous data distribution. Results: The assessment of the proposed method's effectiveness is conducted by measuring Precision, Recall, Accuracy and F1-Score, and it is compared with an FL approach working under the hypothesis of stationary data. The experimental evaluation reports performance results with an accuracy and precision of over 0.91, as in the case of stationary data. Conclusion: The results show that heterogeneous data makes it harder to detect positive samples since the most affected metrics by data distributions are Recall and F1.File | Dimensione | Formato | |
---|---|---|---|
Exploring heterogeneous data distribution issues in e-health federated systems.pdf
non disponibili
Tipologia:
Documento in Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.86 MB
Formato
Adobe PDF
|
1.86 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.