CNR Institutional Research Information System

The privacy protection mechanism in the health context is becoming a crucial task given the exponential increase in the adoption of the Electronic Health Records (EHRs) all around the world. This kind of data can be used for medical investigation and research only if it is filtered out of all the so called Protected Health Information (PHI). This paper proposes a clinical de-identification system based on deep learning techniques for Named Entity Recognition and aimed at recognizing PHI entities to be replaced by surrogates in EHRs for anonymization purposes. This system is based on ELECTRA, a recent neural language model, and is enhanced through a sub-document level analysis aimed at grouping input sentences together, through a Sentences Grouping Factor (SGF), with the aim of broadening the representation context and consequently enhancing its ability to learn. This system was experimentally tested on the official dataset distributed in 2014 by Informatics for Integrating Biology & the Bedside research group, exhibiting superior performance compared to the state of the art in terms of detection at the category level, crucial for properly substituting PHI entities with surrogates. The effectiveness of the proposed system with respect to its components has been also confirmed by a further experimental analysis performed by substituting BERT language model in place of ELECTRA and varying SGF in accordance with limitations concerning the maximum input size for the language model used.

Clinical de-identification using sub-document analysis and ELECTRA

R Catelli;F Gargiulo;E Damiano;M Esposito;G De Pietro

2021

Abstract

The privacy protection mechanism in the health context is becoming a crucial task given the exponential increase in the adoption of the Electronic Health Records (EHRs) all around the world. This kind of data can be used for medical investigation and research only if it is filtered out of all the so called Protected Health Information (PHI). This paper proposes a clinical de-identification system based on deep learning techniques for Named Entity Recognition and aimed at recognizing PHI entities to be replaced by surrogates in EHRs for anonymization purposes. This system is based on ELECTRA, a recent neural language model, and is enhanced through a sub-document level analysis aimed at grouping input sentences together, through a Sentences Grouping Factor (SGF), with the aim of broadening the representation context and consequently enhancing its ability to learn. This system was experimentally tested on the official dataset distributed in 2014 by Informatics for Integrating Biology & the Bedside research group, exhibiting superior performance compared to the state of the art in terms of detection at the category level, crucial for properly substituting PHI entities with surrogates. The effectiveness of the proposed system with respect to its components has been also confirmed by a further experimental analysis performed by substituting BERT language model in place of ELECTRA and varying SGF in accordance with limitations concerning the maximum input size for the language model used.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				deep learning
analytical models;privacy;conferences;information filters;electronic healthcare;task analysis
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_458805-doc_178469.pdf solo utenti autorizzati Descrizione: Clinical de-identification using sub-document analysis and ELECTRA Tipologia: Versione Editoriale (PDF) Licenza: Nessuna licenza dichiarata (non attribuibile a prodotti successivi al 2023) Dimensione 314.06 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	314.06 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/429197

Citazioni

ND

ND

ND

social impact