information others than those usually found in machine readable dictionaries or manually encoded by lexicographers are urgently needed. Different sources must be exploited if we want to overcome the “lexical bottleneck” of Natural Language Processing. Very interesting data can be found by processing large textual corpora, where the actual usage of the language can be truly investigated. These data refer, typically, to various kinds of syntagmatic relations, which are particularly problematic in many NLP applications. The paper describes how this data can be at least partially extracted by processing and analysing large text corpora, with quantitative/statistic methods. We describe two types of quantitative analyses whose aim is to extract information on the strength of association between two words, and on fixed phrases and idioms. We observe how the measure of the association ratio provides quantitative evidence to a number of lexical, syntactic and semantic relationships between word-pairs. One of the claims is that the linguistic information embodied in all these quite different types of lexical collocations can be helpful for lexical disambiguation in analysis and crucial for lexical selection in generation. This is a step towards a more objective lexicography and a more “data-based” linguistics.

Acquisition of lexical information from a large textual Italian corpus

Bindi R
2003

Abstract

information others than those usually found in machine readable dictionaries or manually encoded by lexicographers are urgently needed. Different sources must be exploited if we want to overcome the “lexical bottleneck” of Natural Language Processing. Very interesting data can be found by processing large textual corpora, where the actual usage of the language can be truly investigated. These data refer, typically, to various kinds of syntagmatic relations, which are particularly problematic in many NLP applications. The paper describes how this data can be at least partially extracted by processing and analysing large text corpora, with quantitative/statistic methods. We describe two types of quantitative analyses whose aim is to extract information on the strength of association between two words, and on fixed phrases and idioms. We observe how the measure of the association ratio provides quantitative evidence to a number of lexical, syntactic and semantic relationships between word-pairs. One of the claims is that the linguistic information embodied in all these quite different types of lexical collocations can be helpful for lexical disambiguation in analysis and crucial for lexical selection in generation. This is a step towards a more objective lexicography and a more “data-based” linguistics.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Calzolari N it
dc.authority.people Bindi R it
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/19 00:44:52 -
dc.date.available 2024/02/19 00:44:52 -
dc.date.issued 2003 -
dc.description.abstract information others than those usually found in machine readable dictionaries or manually encoded by lexicographers are urgently needed. Different sources must be exploited if we want to overcome the “lexical bottleneck” of Natural Language Processing. Very interesting data can be found by processing large textual corpora, where the actual usage of the language can be truly investigated. These data refer, typically, to various kinds of syntagmatic relations, which are particularly problematic in many NLP applications. The paper describes how this data can be at least partially extracted by processing and analysing large text corpora, with quantitative/statistic methods. We describe two types of quantitative analyses whose aim is to extract information on the strength of association between two words, and on fixed phrases and idioms. We observe how the measure of the association ratio provides quantitative evidence to a number of lexical, syntactic and semantic relationships between word-pairs. One of the claims is that the linguistic information embodied in all these quite different types of lexical collocations can be helpful for lexical disambiguation in analysis and crucial for lexical selection in generation. This is a step towards a more objective lexicography and a more “data-based” linguistics. -
dc.description.allpeople Calzolari N.; Bindi R. -
dc.description.allpeopleoriginal Calzolari N., Bindi R. -
dc.description.fulltext none en
dc.description.numberofauthors 2 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/37647 -
dc.relation.firstpage 117 -
dc.relation.lastpage 131 -
dc.relation.volume 16-17 -
dc.title Acquisition of lexical information from a large textual Italian corpus en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.miur 262 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 64459 -
iris.orcid.lastModifiedDate 2024/03/01 12:59:01 *
iris.orcid.lastModifiedMillisecond 1709294341629 *
iris.sitodocente.maxattempts 1 -
Appare nelle tipologie: 01.01 Articolo in rivista
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/37647
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact