The prototype of an "intelligent" navigation system, which has been implemented on the contents of PUMA (http://puma.isti.cnr.it), a digital library of scientific literature, is presented. The system has been implemented by integrating our core textual search engine (known as DBT) with the TextPower (TP) technology. TP is based on NLP techniques and linguistic resources and provides tools specialized for the evaluation, analysis, classification and browsing of scientific literature. TP extends the facet concept by extracting "field + content" pairs not only from structured fields but also from free text, eg. abstracts, using a linguistic-statistical approach to annotate relevant terminology, named entities, etc. The enriched text can be queried, analysed, and classified using a new version of the DBT System known as "DBT&Facets". DBT&Facets has been implemented on the full bibliographic records of the documents archived in the PUMA digital library of the Italian National Research Council (CNR). PUMA is a user-focused, service-oriented infrastructure which manages 30 CNR institutional repositories containing about 25,000 published or open access documents in a wide variety of disciplines. In an open domain like scientific documentation, our approach based on the criteria of "semantic similarity" is useful - and perhaps more objective than one based on hierarchical elements - as it makes it possible to link different types of information, also across domains if necessary. DBT&Facets is an advanced search tool that permits the user to query and refine their results, and to identify particular relations between them. The aim of the project has been to structure a knowledge system of domain-specific information which assists the user by suggesting possible directions for their search.

Extending the "Facets" concept by applying NLP tools to catalog records of scientific literature

Picchi E;Sassi M;Biagioni S;Giannini S
2010

Abstract

The prototype of an "intelligent" navigation system, which has been implemented on the contents of PUMA (http://puma.isti.cnr.it), a digital library of scientific literature, is presented. The system has been implemented by integrating our core textual search engine (known as DBT) with the TextPower (TP) technology. TP is based on NLP techniques and linguistic resources and provides tools specialized for the evaluation, analysis, classification and browsing of scientific literature. TP extends the facet concept by extracting "field + content" pairs not only from structured fields but also from free text, eg. abstracts, using a linguistic-statistical approach to annotate relevant terminology, named entities, etc. The enriched text can be queried, analysed, and classified using a new version of the DBT System known as "DBT&Facets". DBT&Facets has been implemented on the full bibliographic records of the documents archived in the PUMA digital library of the Italian National Research Council (CNR). PUMA is a user-focused, service-oriented infrastructure which manages 30 CNR institutional repositories containing about 25,000 published or open access documents in a wide variety of disciplines. In an open domain like scientific documentation, our approach based on the criteria of "semantic similarity" is useful - and perhaps more objective than one based on hierarchical elements - as it makes it possible to link different types of information, also across domains if necessary. DBT&Facets is an advanced search tool that permits the user to query and refine their results, and to identify particular relations between them. The aim of the project has been to structure a knowledge system of domain-specific information which assists the user by suggesting possible directions for their search.
2010
Istituto di linguistica computazionale "Antonio Zampolli" - ILC
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
978-90-77484-15-9
NLP tools
Digital libraries
File in questo prodotto:
File Dimensione Formato  
prod_120718-doc_84425.pdf

solo utenti autorizzati

Descrizione: paper
Tipologia: Versione Editoriale (PDF)
Dimensione 148.4 kB
Formato Adobe PDF
148.4 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/86042
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact