CNR Institutional Research Information System

In this paper, we present T2K, a suite of tools for automatically extracting domain-specific knowledge from collections of Italian and English texts. T2K (Text-To-Knowledge v2) relies on a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine learning which are dynamically integrated to provide an accurate and incremental representation of the content of vast repositories of unstructured documents. Extracted knowledge ranges from domain-specific entities and named entities to the relations connecting them and can be used for indexing document collections with respect to different information types. T2K also includes "linguistic profiling" functionalities aimed at supporting the user in constructing the acquisition corpus, e.g. in selecting texts belonging to the same genre or characterized by the same degree of specialization or in monitoring the "added value" of newly inserted documents. T2K is a web application which can be accessed from any browser through a personal account which has been tested in a wide range of domains.

T2K: a System for Automatically Extracting and Organizing Knowledge from Texts

Felice Dell'Orletta;Giulia Venturi;Andrea Cimino;Simonetta Montemagni

2014

Abstract

In this paper, we present T2K, a suite of tools for automatically extracting domain-specific knowledge from collections of Italian and English texts. T2K (Text-To-Knowledge v2) relies on a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine learning which are dynamically integrated to provide an accurate and incremental representation of the content of vast repositories of unstructured documents. Extracted knowledge ranges from domain-specific entities and named entities to the relations connecting them and can be used for indexing document collections with respect to different information types. T2K also includes "linguistic profiling" functionalities aimed at supporting the user in constructing the acquisition corpus, e.g. in selecting texts belonging to the same genre or characterized by the same degree of specialization or in monitoring the "added value" of newly inserted documents. T2K is a web application which can be accessed from any browser through a personal account which has been tested in a wide range of domains.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.people	Felice Dell'Orletta	it
dc.authority.people	Giulia Venturi	it
dc.authority.people	Andrea Cimino	it
dc.authority.people	Simonetta Montemagni	it
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/19 16:13:25	-
dc.date.available	2024/02/19 16:13:25	-
dc.date.issued	2014	-
dc.description.abstracteng	In this paper, we present T2K, a suite of tools for automatically extracting domain-specific knowledge from collections of Italian and English texts. T2K (Text-To-Knowledge v2) relies on a battery of tools for Natural Language Processing (NLP), statistical text analysis and machine learning which are dynamically integrated to provide an accurate and incremental representation of the content of vast repositories of unstructured documents. Extracted knowledge ranges from domain-specific entities and named entities to the relations connecting them and can be used for indexing document collections with respect to different information types. T2K also includes "linguistic profiling" functionalities aimed at supporting the user in constructing the acquisition corpus, e.g. in selecting texts belonging to the same genre or characterized by the same degree of specialization or in monitoring the "added value" of newly inserted documents. T2K is a web application which can be accessed from any browser through a personal account which has been tested in a wide range of domains.	-
dc.description.affiliations	ILC - Istituto di linguistica computazionale "Antonio Zampolli"	-
dc.description.allpeople	Felice Dell'Orletta; Giulia Venturi; Andrea Cimino; Simonetta Montemagni	-
dc.description.allpeopleoriginal	Felice Dell'Orletta, Giulia Venturi, Andrea Cimino, Simonetta Montemagni	-
dc.description.fulltext	none	en
dc.description.numberofauthors	4	-
dc.identifier.isbn	978-2-9517408-8-4	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/226944	-
dc.identifier.url	http://www.lrec-conf.org/proceedings/lrec2014/pdf/590_Paper.pdf	-
dc.language.iso	eng	-
dc.relation.conferencedate	26-31 maggio 2014	-
dc.relation.conferencename	International Conference on Language Resources and Evaluation (LREC)	-
dc.relation.conferenceplace	Reykjavik	-
dc.relation.firstpage	2062	-
dc.relation.lastpage	2070	-
dc.subject.keywords	Natural Language Processing	-
dc.subject.keywords	Information Extraction	-
dc.subject.keywords	Knowledge Management	-
dc.subject.singlekeyword	Natural Language Processing	*
dc.subject.singlekeyword	Information Extraction	*
dc.subject.singlekeyword	Knowledge Management	*
dc.title	T2K: a System for Automatically Extracting and Organizing Knowledge from Texts	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	285670	-
iris.orcid.lastModifiedDate	2024/03/02 03:42:52	*
iris.orcid.lastModifiedMillisecond	1709347372882	*
iris.sitodocente.maxattempts	2	-
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/226944

Citazioni

ND

ND

ND

social impact