The heterogeneity of the formats and standards of clinical data, which includes bothstructured, semi-structured, and unstructured data, in addition to the sensitive information containedin them, require the definition of specific approaches that are able to implement methodologies thatcan permit the extraction of valuable information buried under such data. Although many challengesand issues that have not been fully addressed still exist when this information must be processedand used for further purposes, the most recent techniques based on machine learning and big dataanalytics can support the information extraction process for the secondary use of clinical data. Inparticular, these techniques can facilitate the transformation of heterogeneous data into a commonstandard format. Moreover, they can also be exploited to define anonymization or pseudonymizationapproaches, respecting the privacy requirements stated in the General Data Protection Regulation,Health Insurance Portability and Accountability Act and other national and regional laws. In fact,compliance with these laws requires that only de-identified clinical and personal data can be processedfor secondary analyses, in particular when data is shared or exchanged across different institutions.This work proposes a modular architecture capable of collecting clinical data from heterogeneoussources and transforming them into useful data for secondary uses, such as research, governance,and medical education purposes. The proposed architecture is able to exploit appropriate modulesand algorithms, carry out transformations (pseudonymization and standardization) required to usedata for the second purposes, as well as provide efficient tools to facilitate the retrieval and analysisprocesses. Preliminary experimental tests show good accuracy in terms of quantitative evaluations.

A Privacy-Preserving and Standard-Based Architecture for Secondary Use of Clinical Data

Mario Ciampi
;
Mario Sicuranza;Stefano Silvestri
2022

Abstract

The heterogeneity of the formats and standards of clinical data, which includes bothstructured, semi-structured, and unstructured data, in addition to the sensitive information containedin them, require the definition of specific approaches that are able to implement methodologies thatcan permit the extraction of valuable information buried under such data. Although many challengesand issues that have not been fully addressed still exist when this information must be processedand used for further purposes, the most recent techniques based on machine learning and big dataanalytics can support the information extraction process for the secondary use of clinical data. Inparticular, these techniques can facilitate the transformation of heterogeneous data into a commonstandard format. Moreover, they can also be exploited to define anonymization or pseudonymizationapproaches, respecting the privacy requirements stated in the General Data Protection Regulation,Health Insurance Portability and Accountability Act and other national and regional laws. In fact,compliance with these laws requires that only de-identified clinical and personal data can be processedfor secondary analyses, in particular when data is shared or exchanged across different institutions.This work proposes a modular architecture capable of collecting clinical data from heterogeneoussources and transforming them into useful data for secondary uses, such as research, governance,and medical education purposes. The proposed architecture is able to exploit appropriate modulesand algorithms, carry out transformations (pseudonymization and standardization) required to usedata for the second purposes, as well as provide efficient tools to facilitate the retrieval and analysisprocesses. Preliminary experimental tests show good accuracy in terms of quantitative evaluations.
2022
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
ETL architecture
secondary use of clinical data
HL7 FHIR
information retrieva
privacy laws
pseudonymization.
File in questo prodotto:
File Dimensione Formato  
information-13-00087-v2.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 1.94 MB
Formato Adobe PDF
1.94 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/443929
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 8
social impact