Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to areduced dependency tagset. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported.
Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies
Montemagni Simonetta
2014
Abstract
Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to areduced dependency tagset. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Simi Maria | en |
| dc.authority.people | Bosco Cristina | en |
| dc.authority.people | Montemagni Simonetta | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/20 09:38:03 | - |
| dc.date.available | 2024/02/20 09:38:03 | - |
| dc.date.firstsubmission | 2025/01/10 15:57:31 | * |
| dc.date.issued | 2014 | - |
| dc.date.submission | 2025/01/10 15:57:31 | * |
| dc.description.abstracteng | Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to areduced dependency tagset. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported. | - |
| dc.description.affiliations | Università di Pisa Università di Torino Istituto di Linguistica Computazionale "Antonio Zampolli" | - |
| dc.description.allpeople | Simi, Maria; Bosco, Cristina; Montemagni, Simonetta | - |
| dc.description.allpeopleoriginal | Simi Maria; Bosco Cristina; Montemagni Simonetta | en |
| dc.description.fulltext | open | en |
| dc.description.numberofauthors | 3 | - |
| dc.identifier.isbn | 978-2-9517408-8-4 | en |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/294411 | - |
| dc.identifier.url | http://www.lrec-conf.org/proceedings/lrec2014/pdf/818_Paper.pdf | en |
| dc.language.iso | eng | en |
| dc.publisher.country | FRA | en |
| dc.publisher.name | European Language Resources Association ELRA | en |
| dc.publisher.place | Paris | en |
| dc.relation.alleditors | Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis | en |
| dc.relation.conferencedate | 26-31 May 2014 | en |
| dc.relation.conferencename | Ninth International Conference on Language Resources and Evaluation (LREC'14) | en |
| dc.relation.conferenceplace | Reykjavik, Iceland | en |
| dc.relation.ispartofbook | Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) | en |
| dc.subject.keywords | Italian Treebank | - |
| dc.subject.keywords | Harmonization and Merging of Resources | - |
| dc.subject.keywords | Stanford Dependencie s | - |
| dc.subject.singlekeyword | Italian Treebank | * |
| dc.subject.singlekeyword | Harmonization and Merging of Resources | * |
| dc.subject.singlekeyword | Stanford Dependencie s | * |
| dc.title | Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| dc.type.referee | Sì, ma tipo non specificato | en |
| dc.ugov.descaux1 | 329779 | - |
| iris.mediafilter.data | 2025/04/13 03:29:06 | * |
| iris.orcid.lastModifiedDate | 2025/01/10 15:57:59 | * |
| iris.orcid.lastModifiedMillisecond | 1736521079494 | * |
| iris.scopus.extIssued | 2014 | - |
| iris.scopus.extTitle | Less is more? Towards a reduced inventory of categories for training a parser for the Italian stanford dependencies | - |
| iris.sitodocente.maxattempts | 2 | - |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_329779-doc_101526.pdf
accesso aperto
Descrizione: Less is More? Towards a R educed I nventory of C ategories for T raining a Parser for the Italian Stanford Dependencies
Licenza:
Creative commons
Dimensione
314.39 kB
Formato
Adobe PDF
|
314.39 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


