This paper investigates linguistic complexity across natural languages from a corpus-based perspective and relies on the assumptions of linguistic profiling as a methodological framework. We focus in particular on the domain of syntactic complexity and analyze the distribution of a set of features taken as proxies of complexity phenomena at the sentence level, which were extracted from 63 treebanks annotated according to the Universal Dependencies formalism. This dataset guarantees that the features considered are modeling the same linguistic phenomena in different treebanks, allowing reliable comparison among languages. We show that our approach is able to identify tendencies of structural proximity between languages not necessarily in line with typologically-supported classification, thus shedding light on new corpus-based findings.
Why is this language complex? Cherry-pick the optimal set of features in multilingual treebanks
D Brunato;G Venturi
2022
Abstract
This paper investigates linguistic complexity across natural languages from a corpus-based perspective and relies on the assumptions of linguistic profiling as a methodological framework. We focus in particular on the domain of syntactic complexity and analyze the distribution of a set of features taken as proxies of complexity phenomena at the sentence level, which were extracted from 63 treebanks annotated according to the Universal Dependencies formalism. This dataset guarantees that the features considered are modeling the same linguistic phenomena in different treebanks, allowing reliable comparison among languages. We show that our approach is able to identify tendencies of structural proximity between languages not necessarily in line with typologically-supported classification, thus shedding light on new corpus-based findings.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.ancejournal | LINGUISTICS VANGUARD | en |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | D Brunato | en |
| dc.authority.people | G Venturi | en |
| dc.collection.id.s | b3f88f24-048a-4e43-8ab1-6697b90e068e | * |
| dc.collection.name | 01.01 Articolo in rivista | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2024/02/20 15:01:30 | - |
| dc.date.available | 2024/02/20 15:01:30 | - |
| dc.date.firstsubmission | 2025/01/24 12:43:53 | * |
| dc.date.issued | 2022 | - |
| dc.date.submission | 2025/01/29 10:10:41 | * |
| dc.description.abstracteng | This paper investigates linguistic complexity across natural languages from a corpus-based perspective and relies on the assumptions of linguistic profiling as a methodological framework. We focus in particular on the domain of syntactic complexity and analyze the distribution of a set of features taken as proxies of complexity phenomena at the sentence level, which were extracted from 63 treebanks annotated according to the Universal Dependencies formalism. This dataset guarantees that the features considered are modeling the same linguistic phenomena in different treebanks, allowing reliable comparison among languages. We show that our approach is able to identify tendencies of structural proximity between languages not necessarily in line with typologically-supported classification, thus shedding light on new corpus-based findings. | - |
| dc.description.affiliations | Istituto di Linguistica Computazionale "A. Zampolli" | - |
| dc.description.allpeople | Brunato, D; Venturi, G | - |
| dc.description.allpeopleoriginal | D. Brunato; G. Venturi | en |
| dc.description.fulltext | open | en |
| dc.description.numberofauthors | 2 | - |
| dc.identifier.doi | 10.1515/lingvan-2021-0017 | en |
| dc.identifier.isi | WOS:000870822600001 | - |
| dc.identifier.scopus | 2-s2.0-85141200922 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/420475 | - |
| dc.identifier.url | https://www.degruyter.com/document/doi/10.1515/lingvan-2021-0017/html | en |
| dc.language.iso | eng | en |
| dc.miur.last.status.update | 2024-07-08T15:59:26Z | * |
| dc.relation.firstpage | 59 | en |
| dc.relation.lastpage | 72 | en |
| dc.relation.medium | ELETTRONICO | en |
| dc.relation.numberofpages | 13 | en |
| dc.subject.keywordseng | Linguistic Complexity | - |
| dc.subject.keywordseng | Linguistic Profiling | - |
| dc.subject.keywordseng | Universal Dependencies | - |
| dc.subject.singlekeyword | Linguistic Complexity | * |
| dc.subject.singlekeyword | Linguistic Profiling | * |
| dc.subject.singlekeyword | Universal Dependencies | * |
| dc.title | Why is this language complex? Cherry-pick the optimal set of features in multilingual treebanks | en |
| dc.type.circulation | Internazionale | en |
| dc.type.driver | info:eu-repo/semantics/article | - |
| dc.type.full | 01 Contributo su Rivista::01.01 Articolo in rivista | it |
| dc.type.impactfactor | si | en |
| dc.type.miur | 262 | - |
| dc.type.referee | Esperti anonimi | en |
| dc.ugov.descaux1 | 472409 | - |
| iris.isi.extIssued | 2023 | - |
| iris.isi.extTitle | Why is this language complex? Cherry-pick the optimal set of features in multilingual treebanks | - |
| iris.mediafilter.data | 2025/04/08 04:25:31 | * |
| iris.orcid.lastModifiedDate | 2025/07/20 01:50:16 | * |
| iris.orcid.lastModifiedMillisecond | 1752969016963 | * |
| iris.scopus.extIssued | 2023 | - |
| iris.scopus.extTitle | Why is this language complex? Cherry-pick the optimal set of features in multilingual treebanks | - |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.doi | 10.1515/lingvan-2021-0017 | * |
| iris.unpaywall.isoa | false | * |
| iris.unpaywall.journalisindoaj | false | * |
| iris.unpaywall.metadataCallLastModified | 22/07/2025 04:25:51 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1753151151277 | - |
| iris.unpaywall.oastatus | closed | * |
| isi.authority.ancejournal | LINGUISTICS VANGUARD###2199-174X | * |
| isi.category | OT | * |
| isi.category | OY | * |
| isi.contributor.affiliation | Inst Computat Linguist A Zampolli ILC CNR | - |
| isi.contributor.affiliation | Inst Computat Linguist A Zampolli ILC CNR | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Italy | - |
| isi.contributor.name | Dominique | - |
| isi.contributor.name | Giulia | - |
| isi.contributor.researcherId | MCK-5206-2025 | - |
| isi.contributor.researcherId | AAY-3932-2020 | - |
| isi.contributor.subaffiliation | ItaliaNLP Lab | - |
| isi.contributor.subaffiliation | ItaliaNLP Lab | - |
| isi.contributor.surname | Brunato | - |
| isi.contributor.surname | Venturi | - |
| isi.date.issued | 2023 | * |
| isi.description.abstracteng | This paper investigates linguistic complexity across natural languages from a corpus-based perspective and relies on the assumptions of linguistic profiling as a methodological framework. We focus in particular on the domain of syntactic complexity and analyze the distribution of a set of features taken as proxies of complexity phenomena at sentence level, which were extracted from 63 treebanks annotated according to the Universal Dependencies formalism. This dataset guarantees that the features considered are modeling the same linguistic phenomena in different treebanks, allowing reliable comparison among languages. We show that our approach is able to identify tendencies of structural proximity between languages not necessarily in line with typologically-supported classification, thus shedding light on new corpus-based findings. | * |
| isi.description.allpeopleoriginal | Brunato, D; Venturi, G; | * |
| isi.document.sourcetype | WOS.SSCI | * |
| isi.document.type | Article | * |
| isi.document.types | Article | * |
| isi.identifier.doi | 10.1515/lingvan-2021-0017 | * |
| isi.identifier.isi | WOS:000870822600001 | * |
| isi.journal.journaltitle | LINGUISTICS VANGUARD | * |
| isi.journal.journaltitleabbrev | LINGUIST VANGUARD | * |
| isi.language.original | English | * |
| isi.publisher.place | GENTHINER STRASSE 13, D-10785 BERLIN, GERMANY | * |
| isi.relation.firstpage | 59 | * |
| isi.relation.lastpage | 72 | * |
| isi.relation.volume | 9 | * |
| isi.title | Why is this language complex? Cherry-pick the optimal set of features in multilingual treebanks | * |
| scopus.authority.ancejournal | LINGUISTICS VANGUARD###2199-174X | * |
| scopus.category | 1203 | * |
| scopus.category | 3310 | * |
| scopus.contributor.affiliation | ItaliaNLP Lab | - |
| scopus.contributor.affiliation | ItaliaNLP Lab | - |
| scopus.contributor.afid | 60021199 | - |
| scopus.contributor.afid | 60021199 | - |
| scopus.contributor.auid | 55237740200 | - |
| scopus.contributor.auid | 27568199800 | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | 121833164 | - |
| scopus.contributor.dptid | 121833164 | - |
| scopus.contributor.name | Dominique | - |
| scopus.contributor.name | Giulia | - |
| scopus.contributor.subaffiliation | Institute for Computational Linguistics A. Zampolli (ILC-CNR); | - |
| scopus.contributor.subaffiliation | Institute for Computational Linguistics A. Zampolli (ILC-CNR); | - |
| scopus.contributor.surname | Brunato | - |
| scopus.contributor.surname | Venturi | - |
| scopus.date.issued | 2023 | * |
| scopus.description.abstracteng | This paper investigates linguistic complexity across natural languages from a corpus-based perspective and relies on the assumptions of linguistic profiling as a methodological framework. We focus in particular on the domain of syntactic complexity and analyze the distribution of a set of features taken as proxies of complexity phenomena at sentence level, which were extracted from 63 treebanks annotated according to the Universal Dependencies formalism. This dataset guarantees that the features considered are modeling the same linguistic phenomena in different treebanks, allowing reliable comparison among languages. We show that our approach is able to identify tendencies of structural proximity between languages not necessarily in line with typologically-supported classification, thus shedding light on new corpus-based findings. | * |
| scopus.description.allpeopleoriginal | Brunato D.; Venturi G. | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.date.issued | * |
| scopus.differences | scopus.description.allpeopleoriginal | * |
| scopus.differences | scopus.description.abstracteng | * |
| scopus.differences | scopus.relation.issue | * |
| scopus.differences | scopus.relation.volume | * |
| scopus.document.type | ar | * |
| scopus.document.types | ar | * |
| scopus.identifier.doi | 10.1515/lingvan-2021-0017 | * |
| scopus.identifier.eissn | 2199-174X | * |
| scopus.identifier.pui | 2020999860 | * |
| scopus.identifier.scopus | 2-s2.0-85141200922 | * |
| scopus.journal.sourceid | 21100860908 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | Walter de Gruyter GmbH | * |
| scopus.relation.firstpage | 59 | * |
| scopus.relation.issue | 1 s | * |
| scopus.relation.lastpage | 72 | * |
| scopus.relation.volume | 9 | * |
| scopus.subject.keywords | linguistic complexity; linguistic profiling; syntactic domain; universal dependencies; | * |
| scopus.title | Why is this language complex? Cherry-pick the optimal set of features in multilingual treebanks | * |
| scopus.titleeng | Why is this language complex? Cherry-pick the optimal set of features in multilingual treebanks | * |
| Appare nelle tipologie: | 01.01 Articolo in rivista | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_472409-doc_192275.pdf
accesso aperto
Descrizione: Why is this language complex? Cherry-pick the optimal set of features in multilingual treebanks
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
2.74 MB
Formato
Adobe PDF
|
2.74 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


