Over the last years, linguistic typology started attracting the interest of the community working on cross- and multi-lingual NLP as a way to tackle the bottleneck deriving from the lack of annotated data for many languages. Typological information is mostly acquired from publicly accessible typological databases, manually constructed by linguists. As reported in Ponti et al. (2018), despite the abundant information contained in them for many languages, these resources suffer from two main shortcomings, i.e. their limited coverage and the discrete nature of features (only "the majority value rather than the full range of possible values and their corresponding frequencies" is reported). Corpus-based studies can help to automatically acquire quantitative typological evidence which might be exploited for polyglot NLP. Recently, the availability of corpora annotated following a cross-linguistically consistent annotation scheme such as the one developed in the Universal Dependencies project is prompting new comparative linguistic studies aimed to identify similarities as well as idiosyncrasies among typologically different languages (Nivre, 2015). The line of research described here is aimed at acquiring quantitative typological evidence from UD treebanks through a multilingual contrastive approach.

Dissecting Treebanks to Uncover Typological Trends. A Multilingual Comparative Approach

Alzetta C;Dell'Orletta F;Montemagni S;Venturi G
2019

Abstract

Over the last years, linguistic typology started attracting the interest of the community working on cross- and multi-lingual NLP as a way to tackle the bottleneck deriving from the lack of annotated data for many languages. Typological information is mostly acquired from publicly accessible typological databases, manually constructed by linguists. As reported in Ponti et al. (2018), despite the abundant information contained in them for many languages, these resources suffer from two main shortcomings, i.e. their limited coverage and the discrete nature of features (only "the majority value rather than the full range of possible values and their corresponding frequencies" is reported). Corpus-based studies can help to automatically acquire quantitative typological evidence which might be exploited for polyglot NLP. Recently, the availability of corpora annotated following a cross-linguistically consistent annotation scheme such as the one developed in the Universal Dependencies project is prompting new comparative linguistic studies aimed to identify similarities as well as idiosyncrasies among typologically different languages (Nivre, 2015). The line of research described here is aimed at acquiring quantitative typological evidence from UD treebanks through a multilingual contrastive approach.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Alzetta C it
dc.authority.people Dell'Orletta F it
dc.authority.people Montemagni S it
dc.authority.people Venturi G it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/21 06:07:00 -
dc.date.available 2024/02/21 06:07:00 -
dc.date.issued 2019 -
dc.description.abstracteng Over the last years, linguistic typology started attracting the interest of the community working on cross- and multi-lingual NLP as a way to tackle the bottleneck deriving from the lack of annotated data for many languages. Typological information is mostly acquired from publicly accessible typological databases, manually constructed by linguists. As reported in Ponti et al. (2018), despite the abundant information contained in them for many languages, these resources suffer from two main shortcomings, i.e. their limited coverage and the discrete nature of features (only "the majority value rather than the full range of possible values and their corresponding frequencies" is reported). Corpus-based studies can help to automatically acquire quantitative typological evidence which might be exploited for polyglot NLP. Recently, the availability of corpora annotated following a cross-linguistically consistent annotation scheme such as the one developed in the Universal Dependencies project is prompting new comparative linguistic studies aimed to identify similarities as well as idiosyncrasies among typologically different languages (Nivre, 2015). The line of research described here is aimed at acquiring quantitative typological evidence from UD treebanks through a multilingual contrastive approach. -
dc.description.affiliations Università degli Studi di Genova; Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR) -
dc.description.allpeople Alzetta C.; Dell'Orletta F.; Montemagni S.; Venturi G. -
dc.description.allpeopleoriginal Alzetta C., Dell'Orletta F., Montemagni S., Venturi G. -
dc.description.fulltext none en
dc.description.numberofauthors 4 -
dc.identifier.isbn 978-1-950737-29-1 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/403587 -
dc.identifier.url https://typology-and-nlp.github.io/2019/assets/2019/papers/5.pdf -
dc.language.iso eng -
dc.miur.last.status.update 2024-08-27T13:37:59Z *
dc.relation.conferencedate 01/08/2019 -
dc.relation.conferencename 1st TyP-NLP: The Workshop on Typology for Polyglot NLP, ACL workshop -
dc.relation.conferenceplace Firenze -
dc.relation.firstpage 1 -
dc.relation.lastpage 3 -
dc.relation.numberofpages 3 -
dc.subject.keywords Natural Language Processing -
dc.subject.keywords Linguistic Typology -
dc.subject.singlekeyword Natural Language Processing *
dc.subject.singlekeyword Linguistic Typology *
dc.title Dissecting Treebanks to Uncover Typological Trends. A Multilingual Comparative Approach en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 423881 -
iris.orcid.lastModifiedDate 2024/03/01 17:27:14 *
iris.orcid.lastModifiedMillisecond 1709310434675 *
iris.scopus.extIssued 2019 -
iris.scopus.extTitle Towards the identification of propaedeutic relations in textbooks -
iris.sitodocente.maxattempts 10 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/403587
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact