An attempt to integrate different techniques and various perspectives on lexical knowledge acquisition from text corpora is illustrated. In this program we use three distinct methodologies to handle text data, summarized as follows: (1) Simple and traditional stochastic techniques working on pairs of words. (2) A lexicographic approach guided by the techniques mentioned in Section 1, aiming at a formal description of sense disambiguation in terms of rules. (3) More complex and sophisticated statistical methods working on sets of words (possibly belonging to the same semantic field), which allow us to gain a new perspective on the problem of sense disambiguation. The three approaches are complementary to each other and can be contextually used. The overall objective of our work is to try to integrate data and information coming from different sources, i.e. machine-readable dictionaries, text corpora, linguists' or lexicographers' knowledge, within a computational lexicon. We stress the necessity of convergence of (1) lexical and textual projects, (2) computational and traditional lexicography, and (3) statistical and rule based approaches.

Corpora and Computational Lexica: Integration of Different Methodologies of Lexical Knowledge Acquisition

Remo Bindi;Monica Monachini;Vito Pirrelli;
1994

Abstract

An attempt to integrate different techniques and various perspectives on lexical knowledge acquisition from text corpora is illustrated. In this program we use three distinct methodologies to handle text data, summarized as follows: (1) Simple and traditional stochastic techniques working on pairs of words. (2) A lexicographic approach guided by the techniques mentioned in Section 1, aiming at a formal description of sense disambiguation in terms of rules. (3) More complex and sophisticated statistical methods working on sets of words (possibly belonging to the same semantic field), which allow us to gain a new perspective on the problem of sense disambiguation. The three approaches are complementary to each other and can be contextually used. The overall objective of our work is to try to integrate data and information coming from different sources, i.e. machine-readable dictionaries, text corpora, linguists' or lexicographers' knowledge, within a computational lexicon. We stress the necessity of convergence of (1) lexical and textual projects, (2) computational and traditional lexicography, and (3) statistical and rule based approaches.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/115891
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact