Background and Objective: Healthcare institutions produce and retain a considerable amount of health-associated information about patients, thus providing great potential for medical applications based on machine learning that may use such data to address healthcare problems. However, accessing or disseminating patients’ data beyond the confines of the host institution is frequently hampered by multiple factors, such as privacy concerns. Federated Learning (FL) data systems have been posited as a potential solution to the challenges related to the segregated nature of health data and the hurdles in data exchange. However, although FL addresses privacy concerns related to healthcare data, in real settings, the non-stationary and the non-uniform data distribution affect the performance of ML models. This paper aims to evaluate how these issues directly impact the quality of the estimated performance of a federated neural network model in the e-health domain. Methods: This work extends a time series-based classification problem from a traditional centralized scenario into a federated one under the hypothesis of non-stationary and non-uniform data distribution, also called heterogeneous data distribution. Results: The assessment of the proposed method's effectiveness is conducted by measuring Precision, Recall, Accuracy and F1-Score, and it is compared with an FL approach working under the hypothesis of stationary data. The experimental evaluation reports performance results with an accuracy and precision of over 0.91, as in the case of stationary data. Conclusion: The results show that heterogeneous data makes it harder to detect positive samples since the most affected metrics by data distributions are Recall and F1.

Exploring heterogeneous data distribution issues in e-health federated systems

Paragliola G.
Primo
;
Ribino P.
Secondo
2024

Abstract

Background and Objective: Healthcare institutions produce and retain a considerable amount of health-associated information about patients, thus providing great potential for medical applications based on machine learning that may use such data to address healthcare problems. However, accessing or disseminating patients’ data beyond the confines of the host institution is frequently hampered by multiple factors, such as privacy concerns. Federated Learning (FL) data systems have been posited as a potential solution to the challenges related to the segregated nature of health data and the hurdles in data exchange. However, although FL addresses privacy concerns related to healthcare data, in real settings, the non-stationary and the non-uniform data distribution affect the performance of ML models. This paper aims to evaluate how these issues directly impact the quality of the estimated performance of a federated neural network model in the e-health domain. Methods: This work extends a time series-based classification problem from a traditional centralized scenario into a federated one under the hypothesis of non-stationary and non-uniform data distribution, also called heterogeneous data distribution. Results: The assessment of the proposed method's effectiveness is conducted by measuring Precision, Recall, Accuracy and F1-Score, and it is compared with an FL approach working under the hypothesis of stationary data. The experimental evaluation reports performance results with an accuracy and precision of over 0.91, as in the case of stationary data. Conclusion: The results show that heterogeneous data makes it harder to detect positive samples since the most affected metrics by data distributions are Recall and F1.
2024
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR - Sede Secondaria Napoli
Classification
Continuous learning
Federated learning
Healthcare informatics
Time series analysis
File in questo prodotto:
File Dimensione Formato  
Exploring heterogeneous data distribution issues in e-health federated systems.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.86 MB
Formato Adobe PDF
1.86 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/517903
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact