In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a "Merged Italian Dependency Treebank" (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST-TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an "Italian Stanford Dependency Treebank" (ISDT).
Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond
Montemagni S;
2015
Abstract
In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a "Merged Italian Dependency Treebank" (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST-TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an "Italian Stanford Dependency Treebank" (ISDT).| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | - |
| dc.authority.people | Simi M | it |
| dc.authority.people | Montemagni S | it |
| dc.authority.people | Bosco C | it |
| dc.collection.id.s | 8c50ea44-be95-498f-946e-7bb5bd666b7c | * |
| dc.collection.name | 02.01 Contributo in volume (Capitolo o Saggio) | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/21 05:39:37 | - |
| dc.date.available | 2024/02/21 05:39:37 | - |
| dc.date.issued | 2015 | - |
| dc.description.abstracteng | In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a "Merged Italian Dependency Treebank" (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST-TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an "Italian Stanford Dependency Treebank" (ISDT). | - |
| dc.description.affiliations | Dipartimento di Informatica, Università di Pisa, Largo B. Pontecorvo 3, Pisa, 56127, Italy; Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR), Via G. Moruzzi 1, Pisa, 56124, Italy; Dipartimento di Informatica, Università di Torino, Corso Svizzera 185, Torino, 10149, Italy | - |
| dc.description.allpeople | Simi, M; Montemagni, S; Bosco, C | - |
| dc.description.allpeopleoriginal | Simi M.; Montemagni S.; Bosco C. | - |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 3 | - |
| dc.identifier.doi | 10.1007/978-3-319-14206-7_1 | - |
| dc.identifier.isbn | 978-3-319-14205-0 | - |
| dc.identifier.scopus | 2-s2.0-84927143016 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/297500 | - |
| dc.identifier.url | http://www.scopus.com/inward/record.url?eid=2-s2.0-84927143016&partnerID=q2rCbXpz | - |
| dc.language.iso | eng | - |
| dc.publisher.country | CHE | - |
| dc.publisher.name | Springer International Publishing | - |
| dc.publisher.place | CH-6330 Cham (ZG) | - |
| dc.relation.alleditors | Basili, Roberto; Bosco, Cristina; Delmonte, Rodolfo; Moschitti, Alessandro; Simi, Maria | - |
| dc.relation.firstpage | 3 | - |
| dc.relation.ispartofbook | Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project | - |
| dc.relation.lastpage | 23 | - |
| dc.subject.keywords | Harmonization and merging of resources | - |
| dc.subject.keywords | Italian | - |
| dc.subject.keywords | Dependency Treebank | - |
| dc.subject.singlekeyword | Harmonization and merging of resources | * |
| dc.subject.singlekeyword | Italian | * |
| dc.subject.singlekeyword | Dependency Treebank | * |
| dc.title | Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond | en |
| dc.type.driver | info:eu-repo/semantics/bookPart | - |
| dc.type.full | 02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio) | it |
| dc.type.miur | 268 | - |
| dc.type.referee | Sì, ma tipo non specificato | - |
| dc.ugov.descaux1 | 330110 | - |
| iris.orcid.lastModifiedDate | 2024/04/04 12:40:41 | * |
| iris.orcid.lastModifiedMillisecond | 1712227241158 | * |
| iris.scopus.extIssued | 2015 | - |
| iris.scopus.extTitle | Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond | - |
| iris.sitodocente.maxattempts | 2 | - |
| iris.unpaywall.doi | 10.1007/978-3-319-14206-7_1 | * |
| iris.unpaywall.isoa | false | * |
| iris.unpaywall.journalisindoaj | false | * |
| iris.unpaywall.metadataCallLastModified | 18/12/2025 04:02:23 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1766026943010 | - |
| iris.unpaywall.oastatus | closed | * |
| scopus.authority.anceserie | STUDIES IN COMPUTATIONAL INTELLIGENCE###1860-949X | * |
| scopus.category | 1702 | * |
| scopus.contributor.affiliation | Dipartimento di Informatica, Università di Pisa | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale Antonio Zampolli (ILC–CNR) | - |
| scopus.contributor.affiliation | Dipartimento di Informatica, Università di Torino | - |
| scopus.contributor.afid | 60028868 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60012259 | - |
| scopus.contributor.auid | 7005175069 | - |
| scopus.contributor.auid | 15056781100 | - |
| scopus.contributor.auid | 7004550793 | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | 109696702 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 112950585 | - |
| scopus.contributor.name | Maria | - |
| scopus.contributor.name | Simonetta | - |
| scopus.contributor.name | Cristina | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.surname | Simi | - |
| scopus.contributor.surname | Montemagni | - |
| scopus.contributor.surname | Bosco | - |
| scopus.date.issued | 2015 | * |
| scopus.description.abstracteng | In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a “Merged Italian Dependency Treebank” (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST–TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an “Italian Stanford Dependency Treebank” (ISDT). | * |
| scopus.description.allpeopleoriginal | Simi M.; Montemagni S.; Bosco C. | * |
| scopus.differences | scopus.authority.anceserie | * |
| scopus.differences | scopus.publisher.name | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.description.abstracteng | * |
| scopus.differences | scopus.relation.volume | * |
| scopus.document.type | ar | * |
| scopus.document.types | ar | * |
| scopus.identifier.doi | 10.1007/978-3-319-14206-7_1 | * |
| scopus.identifier.pui | 603604375 | * |
| scopus.identifier.scopus | 2-s2.0-84927143016 | * |
| scopus.journal.sourceid | 4900152708 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | Springer Verlag | * |
| scopus.relation.firstpage | 3 | * |
| scopus.relation.lastpage | 23 | * |
| scopus.relation.volume | 589 | * |
| scopus.subject.keywords | Harmonization and merging of resources; Italian; Treebank; | * |
| scopus.title | Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond | * |
| scopus.titleeng | Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond | * |
| Appare nelle tipologie: | 02.01 Contributo in volume (Capitolo o Saggio) | |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


