CNR Institutional Research Information System

information others than those usually found in machine readable dictionaries or manually encoded by lexicographers are urgently needed. Different sources must be exploited if we want to overcome the lexical bottleneck of Natural Language Processing. Very interesting data can be found by processing large textual corpora, where the actual usage of the language can be truly investigated. These data refer, typically, to various kinds of syntagmatic relations, which are particularly problematic in many NLP applications. The paper describes how this data can be at least partially extracted by processing and analysing large text corpora, with quantitative/statistic methods. We describe two types of quantitative analyses whose aim is to extract information on the strength of association between two words, and on fixed phrases and idioms. We observe how the measure of the association ratio provides quantitative evidence to a number of lexical, syntactic and semantic relationships between word-pairs. One of the claims is that the linguistic information embodied in all these quite different types of lexical collocations can be helpful for lexical disambiguation in analysis and crucial for lexical selection in generation. This is a step towards a more objective lexicography and a more data-based linguistics.

Acquisition of lexical information from a large textual Italian corpus

Calzolari N;Bindi R

2003

Abstract

information others than those usually found in machine readable dictionaries or manually encoded by lexicographers are urgently needed. Different sources must be exploited if we want to overcome the lexical bottleneck of Natural Language Processing. Very interesting data can be found by processing large textual corpora, where the actual usage of the language can be truly investigated. These data refer, typically, to various kinds of syntagmatic relations, which are particularly problematic in many NLP applications. The paper describes how this data can be at least partially extracted by processing and analysing large text corpora, with quantitative/statistic methods. We describe two types of quantitative analyses whose aim is to extract information on the strength of association between two words, and on fixed phrases and idioms. We observe how the measure of the association ratio provides quantitative evidence to a number of lexical, syntactic and semantic relationships between word-pairs. One of the claims is that the linguistic information embodied in all these quite different types of lexical collocations can be helpful for lexical disambiguation in analysis and crucial for lexical selection in generation. This is a step towards a more objective lexicography and a more data-based linguistics.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.people	Calzolari N	it
dc.authority.people	Bindi R	it
dc.collection.id.s	b3f88f24-048a-4e43-8ab1-6697b90e068e	*
dc.collection.name	01.01 Articolo in rivista	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/19 00:44:52	-
dc.date.available	2024/02/19 00:44:52	-
dc.date.issued	2003	-
dc.description.abstract	information others than those usually found in machine readable dictionaries or manually encoded by lexicographers are urgently needed. Different sources must be exploited if we want to overcome the lexical bottleneck of Natural Language Processing. Very interesting data can be found by processing large textual corpora, where the actual usage of the language can be truly investigated. These data refer, typically, to various kinds of syntagmatic relations, which are particularly problematic in many NLP applications. The paper describes how this data can be at least partially extracted by processing and analysing large text corpora, with quantitative/statistic methods. We describe two types of quantitative analyses whose aim is to extract information on the strength of association between two words, and on fixed phrases and idioms. We observe how the measure of the association ratio provides quantitative evidence to a number of lexical, syntactic and semantic relationships between word-pairs. One of the claims is that the linguistic information embodied in all these quite different types of lexical collocations can be helpful for lexical disambiguation in analysis and crucial for lexical selection in generation. This is a step towards a more objective lexicography and a more data-based linguistics.	-
dc.description.allpeople	Calzolari N.; Bindi R.	-
dc.description.allpeopleoriginal	Calzolari N., Bindi R.	-
dc.description.fulltext	none	en
dc.description.numberofauthors	2	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/37647	-
dc.relation.firstpage	117	-
dc.relation.lastpage	131	-
dc.relation.volume	16-17	-
dc.title	Acquisition of lexical information from a large textual Italian corpus	en
dc.type.driver	info:eu-repo/semantics/article	-
dc.type.full	01 Contributo su Rivista::01.01 Articolo in rivista	it
dc.type.miur	262	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	64459	-
iris.orcid.lastModifiedDate	2024/03/01 12:59:01	*
iris.orcid.lastModifiedMillisecond	1709294341629	*
iris.sitodocente.maxattempts	1	-
Appare nelle tipologie:	01.01 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/37647

Citazioni

ND

ND

ND

social impact