In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a "Merged Italian Dependency Treebank" (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST-TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an "Italian Stanford Dependency Treebank" (ISDT).

Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond

Montemagni S;
2015

Abstract

In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a "Merged Italian Dependency Treebank" (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST-TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an "Italian Stanford Dependency Treebank" (ISDT).
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Simi M it
dc.authority.people Montemagni S it
dc.authority.people Bosco C it
dc.collection.id.s 8c50ea44-be95-498f-946e-7bb5bd666b7c *
dc.collection.name 02.01 Contributo in volume (Capitolo o Saggio) *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/21 05:39:37 -
dc.date.available 2024/02/21 05:39:37 -
dc.date.issued 2015 -
dc.description.abstracteng In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a "Merged Italian Dependency Treebank" (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST-TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an "Italian Stanford Dependency Treebank" (ISDT). -
dc.description.affiliations Dipartimento di Informatica, Università di Pisa, Largo B. Pontecorvo 3, Pisa, 56127, Italy; Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR), Via G. Moruzzi 1, Pisa, 56124, Italy; Dipartimento di Informatica, Università di Torino, Corso Svizzera 185, Torino, 10149, Italy -
dc.description.allpeople Simi, M; Montemagni, S; Bosco, C -
dc.description.allpeopleoriginal Simi M.; Montemagni S.; Bosco C. -
dc.description.fulltext none en
dc.description.numberofauthors 3 -
dc.identifier.doi 10.1007/978-3-319-14206-7_1 -
dc.identifier.isbn 978-3-319-14205-0 -
dc.identifier.scopus 2-s2.0-84927143016 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/297500 -
dc.identifier.url http://www.scopus.com/inward/record.url?eid=2-s2.0-84927143016&partnerID=q2rCbXpz -
dc.language.iso eng -
dc.publisher.country CHE -
dc.publisher.name Springer International Publishing -
dc.publisher.place CH-6330 Cham (ZG) -
dc.relation.alleditors Basili, Roberto; Bosco, Cristina; Delmonte, Rodolfo; Moschitti, Alessandro; Simi, Maria -
dc.relation.firstpage 3 -
dc.relation.ispartofbook Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project -
dc.relation.lastpage 23 -
dc.subject.keywords Harmonization and merging of resources -
dc.subject.keywords Italian -
dc.subject.keywords Dependency Treebank -
dc.subject.singlekeyword Harmonization and merging of resources *
dc.subject.singlekeyword Italian *
dc.subject.singlekeyword Dependency Treebank *
dc.title Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond en
dc.type.driver info:eu-repo/semantics/bookPart -
dc.type.full 02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio) it
dc.type.miur 268 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 330110 -
iris.orcid.lastModifiedDate 2024/04/04 12:40:41 *
iris.orcid.lastModifiedMillisecond 1712227241158 *
iris.scopus.extIssued 2015 -
iris.scopus.extTitle Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond -
iris.sitodocente.maxattempts 2 -
iris.unpaywall.doi 10.1007/978-3-319-14206-7_1 *
iris.unpaywall.isoa false *
iris.unpaywall.journalisindoaj false *
iris.unpaywall.metadataCallLastModified 18/12/2025 04:02:23 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1766026943010 -
iris.unpaywall.oastatus closed *
scopus.authority.anceserie STUDIES IN COMPUTATIONAL INTELLIGENCE###1860-949X *
scopus.category 1702 *
scopus.contributor.affiliation Dipartimento di Informatica, Università di Pisa -
scopus.contributor.affiliation Istituto di Linguistica Computazionale Antonio Zampolli (ILC–CNR) -
scopus.contributor.affiliation Dipartimento di Informatica, Università di Torino -
scopus.contributor.afid 60028868 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60012259 -
scopus.contributor.auid 7005175069 -
scopus.contributor.auid 15056781100 -
scopus.contributor.auid 7004550793 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.dptid 109696702 -
scopus.contributor.dptid -
scopus.contributor.dptid 112950585 -
scopus.contributor.name Maria -
scopus.contributor.name Simonetta -
scopus.contributor.name Cristina -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.surname Simi -
scopus.contributor.surname Montemagni -
scopus.contributor.surname Bosco -
scopus.date.issued 2015 *
scopus.description.abstracteng In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a “Merged Italian Dependency Treebank” (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST–TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an “Italian Stanford Dependency Treebank” (ISDT). *
scopus.description.allpeopleoriginal Simi M.; Montemagni S.; Bosco C. *
scopus.differences scopus.authority.anceserie *
scopus.differences scopus.publisher.name *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.description.abstracteng *
scopus.differences scopus.relation.volume *
scopus.document.type ar *
scopus.document.types ar *
scopus.identifier.doi 10.1007/978-3-319-14206-7_1 *
scopus.identifier.pui 603604375 *
scopus.identifier.scopus 2-s2.0-84927143016 *
scopus.journal.sourceid 4900152708 *
scopus.language.iso eng *
scopus.publisher.name Springer Verlag *
scopus.relation.firstpage 3 *
scopus.relation.lastpage 23 *
scopus.relation.volume 589 *
scopus.subject.keywords Harmonization and merging of resources; Italian; Treebank; *
scopus.title Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond *
scopus.titleeng Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond *
Appare nelle tipologie: 02.01 Contributo in volume (Capitolo o Saggio)
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/297500
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact