CNR Institutional Research Information System

In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a "Merged Italian Dependency Treebank" (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST-TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an "Italian Stanford Dependency Treebank" (ISDT).

Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond

Simi M;Montemagni S;Bosco C

2015

Abstract

In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a "Merged Italian Dependency Treebank" (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST-TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an "Italian Stanford Dependency Treebank" (ISDT).

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.people	Simi M	it
dc.authority.people	Montemagni S	it
dc.authority.people	Bosco C	it
dc.collection.id.s	8c50ea44-be95-498f-946e-7bb5bd666b7c	*
dc.collection.name	02.01 Contributo in volume (Capitolo o Saggio)	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/21 05:39:37	-
dc.date.available	2024/02/21 05:39:37	-
dc.date.issued	2015	-
dc.description.abstracteng	In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a "Merged Italian Dependency Treebank" (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST-TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an "Italian Stanford Dependency Treebank" (ISDT).	-
dc.description.affiliations	Dipartimento di Informatica, Università di Pisa, Largo B. Pontecorvo 3, Pisa, 56127, Italy; Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR), Via G. Moruzzi 1, Pisa, 56124, Italy; Dipartimento di Informatica, Università di Torino, Corso Svizzera 185, Torino, 10149, Italy	-
dc.description.allpeople	Simi, M; Montemagni, S; Bosco, C	-
dc.description.allpeopleoriginal	Simi M.; Montemagni S.; Bosco C.	-
dc.description.fulltext	none	en
dc.description.numberofauthors	3	-
dc.identifier.doi	10.1007/978-3-319-14206-7_1	-
dc.identifier.isbn	978-3-319-14205-0	-
dc.identifier.scopus	2-s2.0-84927143016	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/297500	-
dc.identifier.url	http://www.scopus.com/inward/record.url?eid=2-s2.0-84927143016&partnerID=q2rCbXpz	-
dc.language.iso	eng	-
dc.publisher.country	CHE	-
dc.publisher.name	Springer International Publishing	-
dc.publisher.place	CH-6330 Cham (ZG)	-
dc.relation.alleditors	Basili, Roberto; Bosco, Cristina; Delmonte, Rodolfo; Moschitti, Alessandro; Simi, Maria	-
dc.relation.firstpage	3	-
dc.relation.ispartofbook	Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project	-
dc.relation.lastpage	23	-
dc.subject.keywords	Harmonization and merging of resources	-
dc.subject.keywords	Italian	-
dc.subject.keywords	Dependency Treebank	-
dc.subject.singlekeyword	Harmonization and merging of resources	*
dc.subject.singlekeyword	Italian	*
dc.subject.singlekeyword	Dependency Treebank	*
dc.title	Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond	en
dc.type.driver	info:eu-repo/semantics/bookPart	-
dc.type.full	02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio)	it
dc.type.miur	268	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	330110	-
iris.orcid.lastModifiedDate	2024/04/04 12:40:41	*
iris.orcid.lastModifiedMillisecond	1712227241158	*
iris.scopus.extIssued	2015	-
iris.scopus.extTitle	Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond	-
iris.sitodocente.maxattempts	2	-
iris.unpaywall.doi	10.1007/978-3-319-14206-7_1	*
iris.unpaywall.isoa	false	*
iris.unpaywall.journalisindoaj	false	*
iris.unpaywall.metadataCallLastModified	18/12/2025 04:02:23	-
iris.unpaywall.metadataCallLastModifiedMillisecond	1766026943010	-
iris.unpaywall.oastatus	closed	*
scopus.authority.anceserie	STUDIES IN COMPUTATIONAL INTELLIGENCE###1860-949X	*
scopus.category	1702	*
scopus.contributor.affiliation	Dipartimento di Informatica, Università di Pisa	-
scopus.contributor.affiliation	Istituto di Linguistica Computazionale Antonio Zampolli (ILC–CNR)	-
scopus.contributor.affiliation	Dipartimento di Informatica, Università di Torino	-
scopus.contributor.afid	60028868	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60012259	-
scopus.contributor.auid	7005175069	-
scopus.contributor.auid	15056781100	-
scopus.contributor.auid	7004550793	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.dptid	109696702	-
scopus.contributor.dptid		-
scopus.contributor.dptid	112950585	-
scopus.contributor.name	Maria	-
scopus.contributor.name	Simonetta	-
scopus.contributor.name	Cristina	-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.surname	Simi	-
scopus.contributor.surname	Montemagni	-
scopus.contributor.surname	Bosco	-
scopus.date.issued	2015	*
scopus.description.abstracteng	In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined amethodology formapping different annotation schemes, based on: (i)The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a “Merged Italian Dependency Treebank” (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST–TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an “Italian Stanford Dependency Treebank” (ISDT).	*
scopus.description.allpeopleoriginal	Simi M.; Montemagni S.; Bosco C.	*
scopus.differences	scopus.authority.anceserie	*
scopus.differences	scopus.publisher.name	*
scopus.differences	scopus.subject.keywords	*
scopus.differences	scopus.description.abstracteng	*
scopus.differences	scopus.relation.volume	*
scopus.document.type	ar	*
scopus.document.types	ar	*
scopus.identifier.doi	10.1007/978-3-319-14206-7_1	*
scopus.identifier.pui	603604375	*
scopus.identifier.scopus	2-s2.0-84927143016	*
scopus.journal.sourceid	4900152708	*
scopus.language.iso	eng	*
scopus.publisher.name	Springer Verlag	*
scopus.relation.firstpage	3	*
scopus.relation.lastpage	23	*
scopus.relation.volume	589	*
scopus.subject.keywords	Harmonization and merging of resources; Italian; Treebank;	*
scopus.title	Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond	*
scopus.titleeng	Harmonizing and merging Italian treebanks: Towards a merged Italian dependency treebank and beyond	*
Appare nelle tipologie:	02.01 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/297500

Citazioni

ND

2

ND

social impact