CNR Institutional Research Information System

Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to areduced dependency tagset. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported.

Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies

Simi Maria;Bosco Cristina;Montemagni Simonetta

2014

Abstract

Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to areduced dependency tagset. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Simi Maria	en
dc.authority.people	Bosco Cristina	en
dc.authority.people	Montemagni Simonetta	en
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/20 09:38:03	-
dc.date.available	2024/02/20 09:38:03	-
dc.date.firstsubmission	2025/01/10 15:57:31	*
dc.date.issued	2014	-
dc.date.submission	2025/01/10 15:57:31	*
dc.description.abstracteng	Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to areduced dependency tagset. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported.	-
dc.description.affiliations	Università di Pisa Università di Torino Istituto di Linguistica Computazionale "Antonio Zampolli"	-
dc.description.allpeople	Simi, Maria; Bosco, Cristina; Montemagni, Simonetta	-
dc.description.allpeopleoriginal	Simi Maria; Bosco Cristina; Montemagni Simonetta	en
dc.description.fulltext	open	en
dc.description.numberofauthors	3	-
dc.identifier.isbn	978-2-9517408-8-4	en
dc.identifier.uri	https://hdl.handle.net/20.500.14243/294411	-
dc.identifier.url	http://www.lrec-conf.org/proceedings/lrec2014/pdf/818_Paper.pdf	en
dc.language.iso	eng	en
dc.publisher.country	FRA	en
dc.publisher.name	European Language Resources Association ELRA	en
dc.publisher.place	Paris	en
dc.relation.alleditors	Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis	en
dc.relation.conferencedate	26-31 May 2014	en
dc.relation.conferencename	Ninth International Conference on Language Resources and Evaluation (LREC'14)	en
dc.relation.conferenceplace	Reykjavik, Iceland	en
dc.relation.ispartofbook	Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)	en
dc.subject.keywords	Italian Treebank	-
dc.subject.keywords	Harmonization and Merging of Resources	-
dc.subject.keywords	Stanford Dependencie s	-
dc.subject.singlekeyword	Italian Treebank	*
dc.subject.singlekeyword	Harmonization and Merging of Resources	*
dc.subject.singlekeyword	Stanford Dependencie s	*
dc.title	Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
dc.type.referee	Sì, ma tipo non specificato	en
dc.ugov.descaux1	329779	-
iris.mediafilter.data	2025/04/13 03:29:06	*
iris.orcid.lastModifiedDate	2025/01/10 15:57:59	*
iris.orcid.lastModifiedMillisecond	1736521079494	*
iris.scopus.extIssued	2014	-
iris.scopus.extTitle	Less is more? Towards a reduced inventory of categories for training a parser for the Italian stanford dependencies	-
iris.sitodocente.maxattempts	2	-
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_329779-doc_101526.pdf accesso aperto Descrizione: Less is More? Towards a R educed I nventory of C ategories for T raining a Parser for the Italian Stanford Dependencies Licenza: Creative commons Dimensione 314.39 kB Formato Adobe PDF Visualizza/Apri	314.39 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/294411

Citazioni

ND

ND

ND

social impact