CNR Institutional Research Information System

Background and Objective: Healthcare institutions produce and retain a considerable amount of health-associated information about patients, thus providing great potential for medical applications based on machine learning that may use such data to address healthcare problems. However, accessing or disseminating patients’ data beyond the confines of the host institution is frequently hampered by multiple factors, such as privacy concerns. Federated Learning (FL) data systems have been posited as a potential solution to the challenges related to the segregated nature of health data and the hurdles in data exchange. However, although FL addresses privacy concerns related to healthcare data, in real settings, the non-stationary and the non-uniform data distribution affect the performance of ML models. This paper aims to evaluate how these issues directly impact the quality of the estimated performance of a federated neural network model in the e-health domain. Methods: This work extends a time series-based classification problem from a traditional centralized scenario into a federated one under the hypothesis of non-stationary and non-uniform data distribution, also called heterogeneous data distribution. Results: The assessment of the proposed method's effectiveness is conducted by measuring Precision, Recall, Accuracy and F1-Score, and it is compared with an FL approach working under the hypothesis of stationary data. The experimental evaluation reports performance results with an accuracy and precision of over 0.91, as in the case of stationary data. Conclusion: The results show that heterogeneous data makes it harder to detect positive samples since the most affected metrics by data distributions are Recall and F1.

Exploring heterogeneous data distribution issues in e-health federated systems

Paragliola G.^Primo;Ribino P.^Secondo

2024

Abstract

Background and Objective: Healthcare institutions produce and retain a considerable amount of health-associated information about patients, thus providing great potential for medical applications based on machine learning that may use such data to address healthcare problems. However, accessing or disseminating patients’ data beyond the confines of the host institution is frequently hampered by multiple factors, such as privacy concerns. Federated Learning (FL) data systems have been posited as a potential solution to the challenges related to the segregated nature of health data and the hurdles in data exchange. However, although FL addresses privacy concerns related to healthcare data, in real settings, the non-stationary and the non-uniform data distribution affect the performance of ML models. This paper aims to evaluate how these issues directly impact the quality of the estimated performance of a federated neural network model in the e-health domain. Methods: This work extends a time series-based classification problem from a traditional centralized scenario into a federated one under the hypothesis of non-stationary and non-uniform data distribution, also called heterogeneous data distribution. Results: The assessment of the proposed method's effectiveness is conducted by measuring Precision, Recall, Accuracy and F1-Score, and it is compared with an FL approach working under the hypothesis of stationary data. The experimental evaluation reports performance results with an accuracy and precision of over 0.91, as in the case of stationary data. Conclusion: The results show that heterogeneous data makes it harder to detect positive samples since the most affected metrics by data distributions are Recall and F1.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR - Sede Secondaria Napoli
			
	Parole chiave
	
				Classification
Continuous learning
Federated learning
Healthcare informatics
Time series analysis
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Exploring heterogeneous data distribution issues in e-health federated systems.pdf non disponibili Tipologia: Documento in Post-print Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.86 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.86 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/517903

Citazioni

ND

2

ND

social impact