CNR Institutional Research Information System

We describe a technique for the acquisition of terms from Italian domain text corpora, which relies both on sophisticated linguistic analysis and on statistical measures applied to linguistically processed text rather than to raw text as it is usually the case. The main advantage of this technique is that minimal a priori knowledge of term structure is required, thus allowing to explore and discover terms in a given domain without imposing a strict pattern matching structure on them, and also to easily extend it to different domains. The approach we present in this paper is incremental as it may be iterated to discover terms of increasing complexity built on top of terms discovered in the previous iteration. The reason why it is convenient to adopt such an incremental approach is that it allows to "clean" data from noise in the first step, elicitating the constituent terms, and then to refine term acquisition on "skimmed" term data.

Automatic Incremental Term Acquisition from Domain Corpora

Bartolini R;Giorgetti D;Lenci A;Montemagni S;Pirrelli V

2005

Abstract

We describe a technique for the acquisition of terms from Italian domain text corpora, which relies both on sophisticated linguistic analysis and on statistical measures applied to linguistically processed text rather than to raw text as it is usually the case. The main advantage of this technique is that minimal a priori knowledge of term structure is required, thus allowing to explore and discover terms in a given domain without imposing a strict pattern matching structure on them, and also to easily extend it to different domains. The approach we present in this paper is incremental as it may be iterated to discover terms of increasing complexity built on top of terms discovered in the previous iteration. The reason why it is convenient to adopt such an incremental approach is that it allows to "clean" data from noise in the first step, elicitating the constituent terms, and then to refine term acquisition on "skimmed" term data.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.people	Bartolini R	it
dc.authority.people	Giorgetti D	it
dc.authority.people	Lenci A	it
dc.authority.people	Montemagni S	it
dc.authority.people	Pirrelli V	it
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/20 08:21:29	-
dc.date.available	2024/02/20 08:21:29	-
dc.date.issued	2005	-
dc.description.abstracteng	We describe a technique for the acquisition of terms from Italian domain text corpora, which relies both on sophisticated linguistic analysis and on statistical measures applied to linguistically processed text rather than to raw text as it is usually the case. The main advantage of this technique is that minimal a priori knowledge of term structure is required, thus allowing to explore and discover terms in a given domain without imposing a strict pattern matching structure on them, and also to easily extend it to different domains. The approach we present in this paper is incremental as it may be iterated to discover terms of increasing complexity built on top of terms discovered in the previous iteration. The reason why it is convenient to adopt such an incremental approach is that it allows to "clean" data from noise in the first step, elicitating the constituent terms, and then to refine term acquisition on "skimmed" term data.	-
dc.description.affiliations	Lenci A. (Università di Pisa).	-
dc.description.allpeople	Bartolini, R; Giorgetti, D; Lenci, A; Montemagni, S; Pirrelli, V	-
dc.description.allpeopleoriginal	Bartolini R., Giorgetti D., Lenci A., Montemagni S., Pirrelli V.	-
dc.description.fulltext	none	en
dc.description.numberofauthors	5	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/431279	-
dc.language.iso	eng	-
dc.relation.conferencename	7th International conference on Terminology and Knowledge Engineering (TKE2005)	-
dc.relation.conferenceplace	Copenhagen	-
dc.relation.firstpage	293	-
dc.relation.ispartofbook	Proceedings of TKE 2005 - 7th International Conference on Terminology and Knowledge Engineering	-
dc.relation.lastpage	300	-
dc.title	Automatic Incremental Term Acquisition from Domain Corpora	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	84576	-
iris.orcid.lastModifiedDate	2024/04/04 13:48:31	*
iris.orcid.lastModifiedMillisecond	1712231311807	*
iris.scopus.extIssued	2005	-
iris.scopus.extTitle	Automatic incremental term acquisition from domain corpora	-
iris.sitodocente.maxattempts	2	-
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/431279

Citazioni

ND

ND

ND

social impact