CNR Institutional Research Information System

Named Entity Recognition (NER) systems allow complex concepts extraction and text mining from Natural Language documents. Actually, NER systems based on Deep Learning (DL) approaches are able to reach state-of-the-art performances when applied to general domain texts. On the other hand, the performances of these systems decrease when they are applied to texts that belong to specific domains, such as the biomedical one. In particular, Biomedical NER (B-NER) is a crucial task for the automatic analysis of medical documents, such as Electronic Health Records (EHRs), in order to support the work of physicians and researchers. Thus, new approaches are required to boost B-NER systems performances. In this paper we analyze the behaviour of a B-NER DL architecture specifically devoted to Italian EHRs, focusing on the contribution of different Word Embeddings (WEs) models used as input text representation layer. The achieved results show the substantial contribution of WEs trained on a closed domain corpus exclusively formed by documents belonging to the biomedical domain. The resulting improvements are comparable with the ones obtained using the most recent and complex neural language models, such as ELMo or BERT, which have a much higher computational complexity if compared with classic WEs approach.

Improving Biomedical Information Extraction with Word Embeddings Trained on Closed-Domain Corpora

Stefano Silvestri;Francesco Gargiulo;Mario Ciampi

2019

Abstract

Named Entity Recognition (NER) systems allow complex concepts extraction and text mining from Natural Language documents. Actually, NER systems based on Deep Learning (DL) approaches are able to reach state-of-the-art performances when applied to general domain texts. On the other hand, the performances of these systems decrease when they are applied to texts that belong to specific domains, such as the biomedical one. In particular, Biomedical NER (B-NER) is a crucial task for the automatic analysis of medical documents, such as Electronic Health Records (EHRs), in order to support the work of physicians and researchers. Thus, new approaches are required to boost B-NER systems performances. In this paper we analyze the behaviour of a B-NER DL architecture specifically devoted to Italian EHRs, focusing on the contribution of different Word Embeddings (WEs) models used as input text representation layer. The achieved results show the substantial contribution of WEs trained on a closed domain corpus exclusively formed by documents belonging to the biomedical domain. The resulting improvements are comparable with the ones obtained using the most recent and complex neural language models, such as ELMo or BERT, which have a much higher computational complexity if compared with classic WEs approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Codice ISBN
	
				978-1-7281-2999-0
			
	Parole chiave
	
				Named Entity Recognition
Deep Learning
Word Embeddings
Biomedical NER
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/361959

Citazioni

ND

ND

ND

social impact