The prototype of an "intelligent" navigation system, which has been implemented on the contents of PUMA (http://puma.isti.cnr.it), a digital library of scientific literature, is presented. The system has been implemented by integrating our core textual search engine (known as DBT) with the TextPower (TP) technology. TP is based on NLP techniques and linguistic resources and provides tools specialized for the evaluation, analysis, classification and browsing of scientific literature. TP extends the facet concept by extracting "field + content" pairs not only from structured fields but also from free text, eg. abstracts, using a linguistic-statistical approach to annotate relevant terminology, named entities, etc. The enriched text can be queried, analysed, and classified using a new version of the DBT System known as "DBT&Facets". DBT&Facets has been implemented on the full bibliographic records of the documents archived in the PUMA digital library of the Italian National Research Council (CNR). PUMA is a user-focused, service-oriented infrastructure which manages 30 CNR institutional repositories containing about 25,000 published or open access documents in a wide variety of disciplines. In an open domain like scientific documentation, our approach based on the criteria of "semantic similarity" is useful - and perhaps more objective than one based on hierarchical elements - as it makes it possible to link different types of information, also across domains if necessary. DBT&Facets is an advanced search tool that permits the user to query and refine their results, and to identify particular relations between them. The aim of the project has been to structure a knowledge system of domain-specific information which assists the user by suggesting possible directions for their search.
Extending the "Facets" concept by applying NLP tools to catalog records of scientific literature
Picchi E;Sassi M;Biagioni S;Giannini S
2010
Abstract
The prototype of an "intelligent" navigation system, which has been implemented on the contents of PUMA (http://puma.isti.cnr.it), a digital library of scientific literature, is presented. The system has been implemented by integrating our core textual search engine (known as DBT) with the TextPower (TP) technology. TP is based on NLP techniques and linguistic resources and provides tools specialized for the evaluation, analysis, classification and browsing of scientific literature. TP extends the facet concept by extracting "field + content" pairs not only from structured fields but also from free text, eg. abstracts, using a linguistic-statistical approach to annotate relevant terminology, named entities, etc. The enriched text can be queried, analysed, and classified using a new version of the DBT System known as "DBT&Facets". DBT&Facets has been implemented on the full bibliographic records of the documents archived in the PUMA digital library of the Italian National Research Council (CNR). PUMA is a user-focused, service-oriented infrastructure which manages 30 CNR institutional repositories containing about 25,000 published or open access documents in a wide variety of disciplines. In an open domain like scientific documentation, our approach based on the criteria of "semantic similarity" is useful - and perhaps more objective than one based on hierarchical elements - as it makes it possible to link different types of information, also across domains if necessary. DBT&Facets is an advanced search tool that permits the user to query and refine their results, and to identify particular relations between them. The aim of the project has been to structure a knowledge system of domain-specific information which assists the user by suggesting possible directions for their search.File | Dimensione | Formato | |
---|---|---|---|
prod_120718-doc_84425.pdf
solo utenti autorizzati
Descrizione: paper
Tipologia:
Versione Editoriale (PDF)
Dimensione
148.4 kB
Formato
Adobe PDF
|
148.4 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.