In this paper, we present T2K, a suite of tools for automatically extracting domain-specific knowledge from collections of Italian and English texts. T2K (Text-To-Knowledge v2) relies on a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine learning which are dynamically integrated to provide an accurate and incremental representation of the content of vast repositories of unstructured documents. Extracted knowledge ranges from domain-specific entities and named entities to the relations connecting them and can be used for indexing document collections with respect to different information types. T2K also includes "linguistic profiling" functionalities aimed at supporting the user in constructing the acquisition corpus, e.g. in selecting texts belonging to the same genre or characterized by the same degree of specialization or in monitoring the "added value" of newly inserted documents. T2K is a web application which can be accessed from any browser through a personal account which has been tested in a wide range of domains.

T2K: a System for Automatically Extracting and Organizing Knowledge from Texts

Felice Dell'Orletta;Giulia Venturi;Andrea Cimino;Simonetta Montemagni
2014

Abstract

In this paper, we present T2K, a suite of tools for automatically extracting domain-specific knowledge from collections of Italian and English texts. T2K (Text-To-Knowledge v2) relies on a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine learning which are dynamically integrated to provide an accurate and incremental representation of the content of vast repositories of unstructured documents. Extracted knowledge ranges from domain-specific entities and named entities to the relations connecting them and can be used for indexing document collections with respect to different information types. T2K also includes "linguistic profiling" functionalities aimed at supporting the user in constructing the acquisition corpus, e.g. in selecting texts belonging to the same genre or characterized by the same degree of specialization or in monitoring the "added value" of newly inserted documents. T2K is a web application which can be accessed from any browser through a personal account which has been tested in a wide range of domains.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Felice Dell'Orletta it
dc.authority.people Giulia Venturi it
dc.authority.people Andrea Cimino it
dc.authority.people Simonetta Montemagni it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/19 16:13:25 -
dc.date.available 2024/02/19 16:13:25 -
dc.date.issued 2014 -
dc.description.abstracteng In this paper, we present T2K, a suite of tools for automatically extracting domain-specific knowledge from collections of Italian and English texts. T2K (Text-To-Knowledge v2) relies on a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine learning which are dynamically integrated to provide an accurate and incremental representation of the content of vast repositories of unstructured documents. Extracted knowledge ranges from domain-specific entities and named entities to the relations connecting them and can be used for indexing document collections with respect to different information types. T2K also includes "linguistic profiling" functionalities aimed at supporting the user in constructing the acquisition corpus, e.g. in selecting texts belonging to the same genre or characterized by the same degree of specialization or in monitoring the "added value" of newly inserted documents. T2K is a web application which can be accessed from any browser through a personal account which has been tested in a wide range of domains. -
dc.description.affiliations ILC - Istituto di linguistica computazionale "Antonio Zampolli" -
dc.description.allpeople Felice Dell'Orletta; Giulia Venturi; Andrea Cimino; Simonetta Montemagni -
dc.description.allpeopleoriginal Felice Dell'Orletta, Giulia Venturi, Andrea Cimino, Simonetta Montemagni -
dc.description.fulltext none en
dc.description.numberofauthors 4 -
dc.identifier.isbn 978-2-9517408-8-4 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/226944 -
dc.identifier.url http://www.lrec-conf.org/proceedings/lrec2014/pdf/590_Paper.pdf -
dc.language.iso eng -
dc.relation.conferencedate 26-31 maggio 2014 -
dc.relation.conferencename International Conference on Language Resources and Evaluation (LREC) -
dc.relation.conferenceplace Reykjavik -
dc.relation.firstpage 2062 -
dc.relation.lastpage 2070 -
dc.subject.keywords Natural Language Processing -
dc.subject.keywords Information Extraction -
dc.subject.keywords Knowledge Management -
dc.subject.singlekeyword Natural Language Processing *
dc.subject.singlekeyword Information Extraction *
dc.subject.singlekeyword Knowledge Management *
dc.title T2K: a System for Automatically Extracting and Organizing Knowledge from Texts en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 285670 -
iris.orcid.lastModifiedDate 2024/03/02 03:42:52 *
iris.orcid.lastModifiedMillisecond 1709347372882 *
iris.sitodocente.maxattempts 2 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/226944
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact