Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to areduced dependency tagset. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported.

Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies

Montemagni Simonetta
2014

Abstract

Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to areduced dependency tagset. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Simi Maria en
dc.authority.people Bosco Cristina en
dc.authority.people Montemagni Simonetta en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/20 09:38:03 -
dc.date.available 2024/02/20 09:38:03 -
dc.date.firstsubmission 2025/01/10 15:57:31 *
dc.date.issued 2014 -
dc.date.submission 2025/01/10 15:57:31 *
dc.description.abstracteng Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to areduced dependency tagset. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported. -
dc.description.affiliations Università di Pisa Università di Torino Istituto di Linguistica Computazionale "Antonio Zampolli" -
dc.description.allpeople Simi, Maria; Bosco, Cristina; Montemagni, Simonetta -
dc.description.allpeopleoriginal Simi Maria; Bosco Cristina; Montemagni Simonetta en
dc.description.fulltext open en
dc.description.numberofauthors 3 -
dc.identifier.isbn 978-2-9517408-8-4 en
dc.identifier.uri https://hdl.handle.net/20.500.14243/294411 -
dc.identifier.url http://www.lrec-conf.org/proceedings/lrec2014/pdf/818_Paper.pdf en
dc.language.iso eng en
dc.publisher.country FRA en
dc.publisher.name European Language Resources Association ELRA en
dc.publisher.place Paris en
dc.relation.alleditors Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis en
dc.relation.conferencedate 26-31 May 2014 en
dc.relation.conferencename Ninth International Conference on Language Resources and Evaluation (LREC'14) en
dc.relation.conferenceplace Reykjavik, Iceland en
dc.relation.ispartofbook Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) en
dc.subject.keywords Italian Treebank -
dc.subject.keywords Harmonization and Merging of Resources -
dc.subject.keywords Stanford Dependencie s -
dc.subject.singlekeyword Italian Treebank *
dc.subject.singlekeyword Harmonization and Merging of Resources *
dc.subject.singlekeyword Stanford Dependencie s *
dc.title Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato en
dc.ugov.descaux1 329779 -
iris.mediafilter.data 2025/04/13 03:29:06 *
iris.orcid.lastModifiedDate 2025/01/10 15:57:59 *
iris.orcid.lastModifiedMillisecond 1736521079494 *
iris.scopus.extIssued 2014 -
iris.scopus.extTitle Less is more? Towards a reduced inventory of categories for training a parser for the Italian stanford dependencies -
iris.sitodocente.maxattempts 2 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_329779-doc_101526.pdf

accesso aperto

Descrizione: Less is More? Towards a R educed I nventory of C ategories for T raining a Parser for the Italian Stanford Dependencies
Licenza: Creative commons
Dimensione 314.39 kB
Formato Adobe PDF
314.39 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/294411
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact