In this paper we address the problem of analyzing biomedical data collection with the purpose of searching for semantic similarity among textual documents. In details, we leverage Word Embeddings models obtained by word2vec algorithm and a specific Big Data architecture for their management, defining an approach able to permit the retrieving of semantic similar texts among a huge biomedical text corpus. The proposed architecture has been developed with the purpose of improving a previous implementation, lowering the computational time and allowing in this way the use of the whole PubMed library as dataset, proving also the usability of this methodology in a real context.

A Big Data Approach for Health Data Information Retrieval

Mario Ciampi;Giuseppe de Pietro;Stefano Silvestri
2019

Abstract

In this paper we address the problem of analyzing biomedical data collection with the purpose of searching for semantic similarity among textual documents. In details, we leverage Word Embeddings models obtained by word2vec algorithm and a specific Big Data architecture for their management, defining an approach able to permit the retrieving of semantic similar texts among a huge biomedical text corpus. The proposed architecture has been developed with the purpose of improving a previous implementation, lowering the computational time and allowing in this way the use of the whole PubMed library as dataset, proving also the usability of this methodology in a real context.
2019
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
978-1-7281-1867-3
Big Data
Information Retrieval
Word Embeddings
Big Data Architecture
Natural LAnguage Processing
File in questo prodotto:
File Dimensione Formato  
Pubblicazione18.pdf

non disponibili

Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 789.65 kB
Formato Adobe PDF
789.65 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/360940
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact