CNR Institutional Research Information System

In this paper we address the problem of analyzing biomedical data collection with the purpose of searching for semantic similarity among textual documents. In details, we leverage Word Embeddings models obtained by word2vec algorithm and a specific Big Data architecture for their management, defining an approach able to permit the retrieving of semantic similar texts among a huge biomedical text corpus. The proposed architecture has been developed with the purpose of improving a previous implementation, lowering the computational time and allowing in this way the use of the whole PubMed library as dataset, proving also the usability of this methodology in a real context.

A Big Data Approach for Health Data Information Retrieval

Mario Ciampi;Elio Masciari;Giuseppe de Pietro;Stefano Silvestri

2019

Abstract

In this paper we address the problem of analyzing biomedical data collection with the purpose of searching for semantic similarity among textual documents. In details, we leverage Word Embeddings models obtained by word2vec algorithm and a specific Big Data architecture for their management, defining an approach able to permit the retrieving of semantic similar texts among a huge biomedical text corpus. The proposed architecture has been developed with the purpose of improving a previous implementation, lowering the computational time and allowing in this way the use of the whole PubMed library as dataset, proving also the usability of this methodology in a real context.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Codice ISBN
	
				978-1-7281-1867-3
			
	Parole chiave
	
				Big Data
Information Retrieval
Word Embeddings
Big Data Architecture
Natural LAnguage Processing
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Pubblicazione18.pdf non disponibili Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 789.65 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	789.65 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/360940

Citazioni

ND

1

ND

social impact