We describe a technique for the acquisition of terms from Italian domain text corpora, which relies both on sophisticated linguistic analysis and on statistical measures applied to linguistically processed text rather than to raw text as it is usually the case. The main advantage of this technique is that minimal a priori knowledge of term structure is required, thus allowing to explore and discover terms in a given domain without imposing a strict pattern matching structure on them, and also to easily extend it to different domains. The approach we present in this paper is incremental as it may be iterated to discover terms of increasing complexity built on top of terms discovered in the previous iteration. The reason why it is convenient to adopt such an incremental approach is that it allows to "clean" data from noise in the first step, elicitating the constituent terms, and then to refine term acquisition on "skimmed" term data.

Automatic Incremental Term Acquisition from Domain Corpora

Bartolini R;Montemagni S;Pirrelli V
2005

Abstract

We describe a technique for the acquisition of terms from Italian domain text corpora, which relies both on sophisticated linguistic analysis and on statistical measures applied to linguistically processed text rather than to raw text as it is usually the case. The main advantage of this technique is that minimal a priori knowledge of term structure is required, thus allowing to explore and discover terms in a given domain without imposing a strict pattern matching structure on them, and also to easily extend it to different domains. The approach we present in this paper is incremental as it may be iterated to discover terms of increasing complexity built on top of terms discovered in the previous iteration. The reason why it is convenient to adopt such an incremental approach is that it allows to "clean" data from noise in the first step, elicitating the constituent terms, and then to refine term acquisition on "skimmed" term data.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Bartolini R it
dc.authority.people Giorgetti D it
dc.authority.people Lenci A it
dc.authority.people Montemagni S it
dc.authority.people Pirrelli V it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/20 08:21:29 -
dc.date.available 2024/02/20 08:21:29 -
dc.date.issued 2005 -
dc.description.abstracteng We describe a technique for the acquisition of terms from Italian domain text corpora, which relies both on sophisticated linguistic analysis and on statistical measures applied to linguistically processed text rather than to raw text as it is usually the case. The main advantage of this technique is that minimal a priori knowledge of term structure is required, thus allowing to explore and discover terms in a given domain without imposing a strict pattern matching structure on them, and also to easily extend it to different domains. The approach we present in this paper is incremental as it may be iterated to discover terms of increasing complexity built on top of terms discovered in the previous iteration. The reason why it is convenient to adopt such an incremental approach is that it allows to "clean" data from noise in the first step, elicitating the constituent terms, and then to refine term acquisition on "skimmed" term data. -
dc.description.affiliations Lenci A. (Università di Pisa). -
dc.description.allpeople Bartolini, R; Giorgetti, D; Lenci, A; Montemagni, S; Pirrelli, V -
dc.description.allpeopleoriginal Bartolini R., Giorgetti D., Lenci A., Montemagni S., Pirrelli V. -
dc.description.fulltext none en
dc.description.numberofauthors 5 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/431279 -
dc.language.iso eng -
dc.relation.conferencename 7th International conference on Terminology and Knowledge Engineering (TKE2005) -
dc.relation.conferenceplace Copenhagen -
dc.relation.firstpage 293 -
dc.relation.ispartofbook Proceedings of TKE 2005 - 7th International Conference on Terminology and Knowledge Engineering -
dc.relation.lastpage 300 -
dc.title Automatic Incremental Term Acquisition from Domain Corpora en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 84576 -
iris.orcid.lastModifiedDate 2024/04/04 13:48:31 *
iris.orcid.lastModifiedMillisecond 1712231311807 *
iris.scopus.extIssued 2005 -
iris.scopus.extTitle Automatic incremental term acquisition from domain corpora -
iris.sitodocente.maxattempts 2 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/431279
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact