An attempt to integrate different techniques and various perspectives on lexical knowledge acquisition from text corpora is illustrated. In this program we use three distinct methodologies to handle text data, summarized as follows: (1) Simple and traditional stochastic techniques working on pairs of words. (2) A lexicographic approach guided by the techniques mentioned in Section 1, aiming at a formal description of sense disambiguation in terms of rules. (3) More complex and sophisticated statistical methods working on sets of words (possibly belonging to the same semantic field), which allow us to gain a new perspective on the problem of sense disambiguation. The three approaches are complementary to each other and can be contextually used. The overall objective of our work is to try to integrate data and information coming from different sources, i.e. machine-readable dictionaries, text corpora, linguists' or lexicographers' knowledge, within a computational lexicon. We stress the necessity of convergence of (1) lexical and textual projects, (2) computational and traditional lexicography, and (3) statistical and rule based approaches.

Corpora and Computational Lexica: Integration of Different Methodologies of Lexical Knowledge Acquisition

Remo Bindi;Monica Monachini;Vito Pirrelli;
1994

Abstract

An attempt to integrate different techniques and various perspectives on lexical knowledge acquisition from text corpora is illustrated. In this program we use three distinct methodologies to handle text data, summarized as follows: (1) Simple and traditional stochastic techniques working on pairs of words. (2) A lexicographic approach guided by the techniques mentioned in Section 1, aiming at a formal description of sense disambiguation in terms of rules. (3) More complex and sophisticated statistical methods working on sets of words (possibly belonging to the same semantic field), which allow us to gain a new perspective on the problem of sense disambiguation. The three approaches are complementary to each other and can be contextually used. The overall objective of our work is to try to integrate data and information coming from different sources, i.e. machine-readable dictionaries, text corpora, linguists' or lexicographers' knowledge, within a computational lexicon. We stress the necessity of convergence of (1) lexical and textual projects, (2) computational and traditional lexicography, and (3) statistical and rule based approaches.
Campo DC Valore Lingua
dc.authority.ancejournal LITERARY & LINGUISTIC COMPUTING -
dc.authority.people Remo Bindi it
dc.authority.people Nicoletta Calzolari it
dc.authority.people Monica Monachini it
dc.authority.people Vito Pirrelli it
dc.authority.people Antonio Zampolli it
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/18 11:05:10 -
dc.date.available 2024/02/18 11:05:10 -
dc.date.issued 1994 -
dc.description.abstracteng An attempt to integrate different techniques and various perspectives on lexical knowledge acquisition from text corpora is illustrated. In this program we use three distinct methodologies to handle text data, summarized as follows: (1) Simple and traditional stochastic techniques working on pairs of words. (2) A lexicographic approach guided by the techniques mentioned in Section 1, aiming at a formal description of sense disambiguation in terms of rules. (3) More complex and sophisticated statistical methods working on sets of words (possibly belonging to the same semantic field), which allow us to gain a new perspective on the problem of sense disambiguation. The three approaches are complementary to each other and can be contextually used. The overall objective of our work is to try to integrate data and information coming from different sources, i.e. machine-readable dictionaries, text corpora, linguists' or lexicographers' knowledge, within a computational lexicon. We stress the necessity of convergence of (1) lexical and textual projects, (2) computational and traditional lexicography, and (3) statistical and rule based approaches. -
dc.description.affiliations Istituto di Linguistica Computazionale "Antonio Zampolli", CNR - Pisa -
dc.description.allpeople Bindi, Remo; Calzolari, Nicoletta; Monachini, Monica; Pirrelli, Vito; Zampolli, Antonio -
dc.description.allpeopleoriginal Remo Bindi, Nicoletta Calzolari, Monica Monachini, Vito Pirrelli and Antonio Zampolli -
dc.description.fulltext none en
dc.description.numberofauthors 5 -
dc.identifier.doi 10.1093/llc/9.1.29 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/115891 -
dc.language.iso eng -
dc.relation.firstpage 29 -
dc.relation.lastpage 46 -
dc.relation.volume 9(1) -
dc.title Corpora and Computational Lexica: Integration of Different Methodologies of Lexical Knowledge Acquisition en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.miur 262 -
dc.ugov.descaux1 225548 -
iris.orcid.lastModifiedDate 2024/04/04 23:06:15 *
iris.orcid.lastModifiedMillisecond 1712264775517 *
iris.scopus.extIssued 1994 -
iris.scopus.extTitle Corpora and computational lexica: Integration of different methodologies of lexical knowledge acquisition -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.doi 10.1093/llc/9.1.29 *
iris.unpaywall.isoa false *
iris.unpaywall.journalisindoaj false *
iris.unpaywall.metadataCallLastModified 23/12/2025 04:02:53 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1766458973502 -
iris.unpaywall.oastatus closed *
Appare nelle tipologie: 01.01 Articolo in rivista
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/115891
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact