One of the major challenges in the post-genomic era is the speed-up of the process of identification of molecules involved in a specific disease (molecular targets). Even if the experimental procedure has greatly enhanced the analytical capability, the textual data analysis still plays a central role in the experimental activity design or in the data collection. The extraction of useful information from published papers is still strongly dependent on the human expertise in the selection and retrieval of the relevant papers. The search for abstracts in the MEDLINE or PubMed databases is a common activity for researcher. Often, the navigation in textual databases is not simple, and in many cases, the user can retrieve only a list of abstracts without any kind of additional information about the relatedness of the abstract content with the submitted query. In the last decade, the application of natural language processing tools has acquired some relevance in bioinformatics field. The possibility to retrieve and organize the textual information according to the specific topics allows the user to select and analyze only a reduced set of papers. In our work, we present the application of a document clustering system founded on self-organizing maps to reorganize in a hierarchical way the cluster of abstracts retrieved by a PubMed query. The system is available at http://www.biocomp.ge.ismac.cnr.it.

Topical Clustering of Biomedical Abstracts by Self-organizing Maps

P Arrigo
2004

Abstract

One of the major challenges in the post-genomic era is the speed-up of the process of identification of molecules involved in a specific disease (molecular targets). Even if the experimental procedure has greatly enhanced the analytical capability, the textual data analysis still plays a central role in the experimental activity design or in the data collection. The extraction of useful information from published papers is still strongly dependent on the human expertise in the selection and retrieval of the relevant papers. The search for abstracts in the MEDLINE or PubMed databases is a common activity for researcher. Often, the navigation in textual databases is not simple, and in many cases, the user can retrieve only a list of abstracts without any kind of additional information about the relatedness of the abstract content with the submitted query. In the last decade, the application of natural language processing tools has acquired some relevance in bioinformatics field. The possibility to retrieve and organize the textual information according to the specific topics allows the user to select and analyze only a reduced set of papers. In our work, we present the application of a document clustering system founded on self-organizing maps to reorganize in a hierarchical way the cluster of abstracts retrieved by a PubMed query. The system is available at http://www.biocomp.ge.ismac.cnr.it.
2004
Istituto per lo Studio delle Macromolecole - ISMAC - Sede Milano
978-1-4020-7735-7
text mining
self-organizing maps conceptual clustering
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/112857
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact