CNR Institutional Research Information System

Over the last years, linguistic typology started attracting the interest of the community working on cross- and multi-lingual NLP as a way to tackle the bottleneck deriving from the lack of annotated data for many languages. Typological information is mostly acquired from publicly accessible typological databases, manually constructed by linguists. As reported in Ponti et al. (2018), despite the abundant information contained in them for many languages, these resources suffer from two main shortcomings, i.e. their limited coverage and the discrete nature of features (only "the majority value rather than the full range of possible values and their corresponding frequencies" is reported). Corpus-based studies can help to automatically acquire quantitative typological evidence which might be exploited for polyglot NLP. Recently, the availability of corpora annotated following a cross-linguistically consistent annotation scheme such as the one developed in the Universal Dependencies project is prompting new comparative linguistic studies aimed to identify similarities as well as idiosyncrasies among typologically different languages (Nivre, 2015). The line of research described here is aimed at acquiring quantitative typological evidence from UD treebanks through a multilingual contrastive approach.

Dissecting Treebanks to Uncover Typological Trends. A Multilingual Comparative Approach

Alzetta C;Dell'Orletta F;Montemagni S;Venturi G

2019

Abstract

Over the last years, linguistic typology started attracting the interest of the community working on cross- and multi-lingual NLP as a way to tackle the bottleneck deriving from the lack of annotated data for many languages. Typological information is mostly acquired from publicly accessible typological databases, manually constructed by linguists. As reported in Ponti et al. (2018), despite the abundant information contained in them for many languages, these resources suffer from two main shortcomings, i.e. their limited coverage and the discrete nature of features (only "the majority value rather than the full range of possible values and their corresponding frequencies" is reported). Corpus-based studies can help to automatically acquire quantitative typological evidence which might be exploited for polyglot NLP. Recently, the availability of corpora annotated following a cross-linguistically consistent annotation scheme such as the one developed in the Universal Dependencies project is prompting new comparative linguistic studies aimed to identify similarities as well as idiosyncrasies among typologically different languages (Nivre, 2015). The line of research described here is aimed at acquiring quantitative typological evidence from UD treebanks through a multilingual contrastive approach.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.people	Alzetta C	it
dc.authority.people	Dell'Orletta F	it
dc.authority.people	Montemagni S	it
dc.authority.people	Venturi G	it
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/21 06:07:00	-
dc.date.available	2024/02/21 06:07:00	-
dc.date.issued	2019	-
dc.description.abstracteng	Over the last years, linguistic typology started attracting the interest of the community working on cross- and multi-lingual NLP as a way to tackle the bottleneck deriving from the lack of annotated data for many languages. Typological information is mostly acquired from publicly accessible typological databases, manually constructed by linguists. As reported in Ponti et al. (2018), despite the abundant information contained in them for many languages, these resources suffer from two main shortcomings, i.e. their limited coverage and the discrete nature of features (only "the majority value rather than the full range of possible values and their corresponding frequencies" is reported). Corpus-based studies can help to automatically acquire quantitative typological evidence which might be exploited for polyglot NLP. Recently, the availability of corpora annotated following a cross-linguistically consistent annotation scheme such as the one developed in the Universal Dependencies project is prompting new comparative linguistic studies aimed to identify similarities as well as idiosyncrasies among typologically different languages (Nivre, 2015). The line of research described here is aimed at acquiring quantitative typological evidence from UD treebanks through a multilingual contrastive approach.	-
dc.description.affiliations	Università degli Studi di Genova; Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR)	-
dc.description.allpeople	Alzetta C.; Dell'Orletta F.; Montemagni S.; Venturi G.	-
dc.description.allpeopleoriginal	Alzetta C., Dell'Orletta F., Montemagni S., Venturi G.	-
dc.description.fulltext	none	en
dc.description.numberofauthors	4	-
dc.identifier.isbn	978-1-950737-29-1	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/403587	-
dc.identifier.url	https://typology-and-nlp.github.io/2019/assets/2019/papers/5.pdf	-
dc.language.iso	eng	-
dc.miur.last.status.update	2024-08-27T13:37:59Z	*
dc.relation.conferencedate	01/08/2019	-
dc.relation.conferencename	1st TyP-NLP: The Workshop on Typology for Polyglot NLP, ACL workshop	-
dc.relation.conferenceplace	Firenze	-
dc.relation.firstpage	1	-
dc.relation.lastpage	3	-
dc.relation.numberofpages	3	-
dc.subject.keywords	Natural Language Processing	-
dc.subject.keywords	Linguistic Typology	-
dc.subject.singlekeyword	Natural Language Processing	*
dc.subject.singlekeyword	Linguistic Typology	*
dc.title	Dissecting Treebanks to Uncover Typological Trends. A Multilingual Comparative Approach	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	423881	-
iris.orcid.lastModifiedDate	2024/03/01 17:27:14	*
iris.orcid.lastModifiedMillisecond	1709310434675	*
iris.scopus.extIssued	2019	-
iris.scopus.extTitle	Towards the identification of propaedeutic relations in textbooks	-
iris.sitodocente.maxattempts	10	-
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/403587

Citazioni

ND

ND

ND

social impact