An attempt to integrate different techniques and various perspectives on lexical knowledge acquisition from text corpora is illustrated. In this program we use three distinct methodologies to handle text data, summarized as follows: (1) Simple and traditional stochastic techniques working on pairs of words. (2) A lexicographic approach guided by the techniques mentioned in Section 1, aiming at a formal description of sense disambiguation in terms of rules. (3) More complex and sophisticated statistical methods working on sets of words (possibly belonging to the same semantic field), which allow us to gain a new perspective on the problem of sense disambiguation. The three approaches are complementary to each other and can be contextually used. The overall objective of our work is to try to integrate data and information coming from different sources, i.e. machine-readable dictionaries, text corpora, linguists' or lexicographers' knowledge, within a computational lexicon. We stress the necessity of convergence of (1) lexical and textual projects, (2) computational and traditional lexicography, and (3) statistical and rule based approaches.
Corpora and Computational Lexica: Integration of Different Methodologies of Lexical Knowledge Acquisition
Remo Bindi;Monica Monachini;Vito Pirrelli;
1994
Abstract
An attempt to integrate different techniques and various perspectives on lexical knowledge acquisition from text corpora is illustrated. In this program we use three distinct methodologies to handle text data, summarized as follows: (1) Simple and traditional stochastic techniques working on pairs of words. (2) A lexicographic approach guided by the techniques mentioned in Section 1, aiming at a formal description of sense disambiguation in terms of rules. (3) More complex and sophisticated statistical methods working on sets of words (possibly belonging to the same semantic field), which allow us to gain a new perspective on the problem of sense disambiguation. The three approaches are complementary to each other and can be contextually used. The overall objective of our work is to try to integrate data and information coming from different sources, i.e. machine-readable dictionaries, text corpora, linguists' or lexicographers' knowledge, within a computational lexicon. We stress the necessity of convergence of (1) lexical and textual projects, (2) computational and traditional lexicography, and (3) statistical and rule based approaches.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.ancejournal | LITERARY & LINGUISTIC COMPUTING | - |
| dc.authority.people | Remo Bindi | it |
| dc.authority.people | Nicoletta Calzolari | it |
| dc.authority.people | Monica Monachini | it |
| dc.authority.people | Vito Pirrelli | it |
| dc.authority.people | Antonio Zampolli | it |
| dc.collection.id.s | b3f88f24-048a-4e43-8ab1-6697b90e068e | * |
| dc.collection.name | 01.01 Articolo in rivista | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/18 11:05:10 | - |
| dc.date.available | 2024/02/18 11:05:10 | - |
| dc.date.issued | 1994 | - |
| dc.description.abstracteng | An attempt to integrate different techniques and various perspectives on lexical knowledge acquisition from text corpora is illustrated. In this program we use three distinct methodologies to handle text data, summarized as follows: (1) Simple and traditional stochastic techniques working on pairs of words. (2) A lexicographic approach guided by the techniques mentioned in Section 1, aiming at a formal description of sense disambiguation in terms of rules. (3) More complex and sophisticated statistical methods working on sets of words (possibly belonging to the same semantic field), which allow us to gain a new perspective on the problem of sense disambiguation. The three approaches are complementary to each other and can be contextually used. The overall objective of our work is to try to integrate data and information coming from different sources, i.e. machine-readable dictionaries, text corpora, linguists' or lexicographers' knowledge, within a computational lexicon. We stress the necessity of convergence of (1) lexical and textual projects, (2) computational and traditional lexicography, and (3) statistical and rule based approaches. | - |
| dc.description.affiliations | Istituto di Linguistica Computazionale "Antonio Zampolli", CNR - Pisa | - |
| dc.description.allpeople | Bindi, Remo; Calzolari, Nicoletta; Monachini, Monica; Pirrelli, Vito; Zampolli, Antonio | - |
| dc.description.allpeopleoriginal | Remo Bindi, Nicoletta Calzolari, Monica Monachini, Vito Pirrelli and Antonio Zampolli | - |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 5 | - |
| dc.identifier.doi | 10.1093/llc/9.1.29 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/115891 | - |
| dc.language.iso | eng | - |
| dc.relation.firstpage | 29 | - |
| dc.relation.lastpage | 46 | - |
| dc.relation.volume | 9(1) | - |
| dc.title | Corpora and Computational Lexica: Integration of Different Methodologies of Lexical Knowledge Acquisition | en |
| dc.type.driver | info:eu-repo/semantics/article | - |
| dc.type.full | 01 Contributo su Rivista::01.01 Articolo in rivista | it |
| dc.type.miur | 262 | - |
| dc.ugov.descaux1 | 225548 | - |
| iris.orcid.lastModifiedDate | 2024/04/04 23:06:15 | * |
| iris.orcid.lastModifiedMillisecond | 1712264775517 | * |
| iris.scopus.extIssued | 1994 | - |
| iris.scopus.extTitle | Corpora and computational lexica: Integration of different methodologies of lexical knowledge acquisition | - |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.doi | 10.1093/llc/9.1.29 | * |
| iris.unpaywall.isoa | false | * |
| iris.unpaywall.journalisindoaj | false | * |
| iris.unpaywall.metadataCallLastModified | 23/12/2025 04:02:53 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1766458973502 | - |
| iris.unpaywall.oastatus | closed | * |
| Appare nelle tipologie: | 01.01 Articolo in rivista | |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


