In this paper we address the problem of analyzing biomedical data collection with the purpose of searching for semantic similarity among textual documents. In details, we leverage Word Embeddings models obtained by word2vec algorithm and a specific Big Data architecture for their management, defining an approach able to permit the retrieving of semantic similar texts among a huge biomedical text corpus. The proposed architecture has been developed with the purpose of improving a previous implementation, lowering the computational time and allowing in this way the use of the whole PubMed library as dataset, proving also the usability of this methodology in a real context.
A Big Data Approach for Health Data Information Retrieval
Mario Ciampi;Giuseppe de Pietro;Stefano Silvestri
2019
Abstract
In this paper we address the problem of analyzing biomedical data collection with the purpose of searching for semantic similarity among textual documents. In details, we leverage Word Embeddings models obtained by word2vec algorithm and a specific Big Data architecture for their management, defining an approach able to permit the retrieving of semantic similar texts among a huge biomedical text corpus. The proposed architecture has been developed with the purpose of improving a previous implementation, lowering the computational time and allowing in this way the use of the whole PubMed library as dataset, proving also the usability of this methodology in a real context.| File | Dimensione | Formato | |
|---|---|---|---|
|
Pubblicazione18.pdf
non disponibili
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
789.65 kB
Formato
Adobe PDF
|
789.65 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


