The paper presents ParlaMint-It, a new treebank of Italian parliamentary debates, linguistically annotated based on the Universal Dependencies (UD) framework. The resource comprises 20,460 tokens and represents a hybrid language variety that is underrepresented in the UD initiative. ParlaMint-It results from a manual revision process that relies on a semi-automatic methodology able to identify sentences that are most likely to contain inconsistencies and recurrent error patterns generated by the automatic annotation. Such a method made the revision process faster and more efficient than revising the entire treebank. In addition, it allowed the identification and correction of annotation errors resulting from linguistic constructions inconsis- tently represented in UD treebanks and from characteristics specific to parliamentary speeches. Hence, the treebank is deemed as an 18-karat resource, since, although not fully manually revised, it is a valuable resource for researchers working on Italian language processing tasks.
Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches
Chiara Alzetta
;Simonetta Montemagni;Giulia Venturi
2024
Abstract
The paper presents ParlaMint-It, a new treebank of Italian parliamentary debates, linguistically annotated based on the Universal Dependencies (UD) framework. The resource comprises 20,460 tokens and represents a hybrid language variety that is underrepresented in the UD initiative. ParlaMint-It results from a manual revision process that relies on a semi-automatic methodology able to identify sentences that are most likely to contain inconsistencies and recurrent error patterns generated by the automatic annotation. Such a method made the revision process faster and more efficient than revising the entire treebank. In addition, it allowed the identification and correction of annotation errors resulting from linguistic constructions inconsis- tently represented in UD treebanks and from characteristics specific to parliamentary speeches. Hence, the treebank is deemed as an 18-karat resource, since, although not fully manually revised, it is a valuable resource for researchers working on Italian language processing tasks.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.ancejournal | LANGUAGE RESOURCES AND EVALUATION | en |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Chiara Alzetta | en |
| dc.authority.people | Simonetta Montemagni | en |
| dc.authority.people | Marta Sartor | en |
| dc.authority.people | Giulia Venturi | en |
| dc.collection.id.s | b3f88f24-048a-4e43-8ab1-6697b90e068e | * |
| dc.collection.name | 01.01 Articolo in rivista | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2024/07/22 15:41:15 | - |
| dc.date.available | 2024/07/22 15:41:15 | - |
| dc.date.firstsubmission | 2024/07/11 15:46:12 | * |
| dc.date.issued | 2024 | - |
| dc.date.submission | 2025/03/06 15:21:46 | * |
| dc.description.abstracteng | The paper presents ParlaMint-It, a new treebank of Italian parliamentary debates, linguistically annotated based on the Universal Dependencies (UD) framework. The resource comprises 20,460 tokens and represents a hybrid language variety that is underrepresented in the UD initiative. ParlaMint-It results from a manual revision process that relies on a semi-automatic methodology able to identify sentences that are most likely to contain inconsistencies and recurrent error patterns generated by the automatic annotation. Such a method made the revision process faster and more efficient than revising the entire treebank. In addition, it allowed the identification and correction of annotation errors resulting from linguistic constructions inconsis- tently represented in UD treebanks and from characteristics specific to parliamentary speeches. Hence, the treebank is deemed as an 18-karat resource, since, although not fully manually revised, it is a valuable resource for researchers working on Italian language processing tasks. | - |
| dc.description.allpeople | Alzetta, Chiara; Montemagni, Simonetta; Sartor, Marta; Venturi, Giulia | - |
| dc.description.allpeopleoriginal | Chiara Alzetta, Simonetta Montemagni, Marta Sartor, Giulia Venturi | en |
| dc.description.fulltext | open | en |
| dc.description.international | no | en |
| dc.description.numberofauthors | 4 | - |
| dc.identifier.doi | 10.1007/s10579-024-09748-6 | en |
| dc.identifier.isi | WOS:001263433400001 | en |
| dc.identifier.scopus | 2-s2.0-85197693530 | en |
| dc.identifier.source | manual | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/484441 | - |
| dc.identifier.url | https://link.springer.com/content/pdf/10.1007/s10579-024-09748-6.pdf | en |
| dc.language.iso | eng | en |
| dc.relation.medium | ELETTRONICO | en |
| dc.relation.numberofpages | 25 | en |
| dc.subject.keywordseng | Universal dependencies treebanks, Annotation revision, Italian parliamentary debates, Linguistic annotation | - |
| dc.subject.singlekeyword | Universal dependencies treebanks | * |
| dc.subject.singlekeyword | Annotation revision | * |
| dc.subject.singlekeyword | Italian parliamentary debates | * |
| dc.subject.singlekeyword | Linguistic annotation | * |
| dc.title | Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches | en |
| dc.type.circulation | Internazionale | en |
| dc.type.driver | info:eu-repo/semantics/article | - |
| dc.type.full | 01 Contributo su Rivista::01.01 Articolo in rivista | it |
| dc.type.impactfactor | si | en |
| dc.type.miur | 262 | - |
| dc.type.referee | Esperti anonimi | en |
| iris.isi.extIssued | 2025 | - |
| iris.isi.extTitle | Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches | - |
| iris.mediafilter.data | 2025/04/03 04:14:46 | * |
| iris.orcid.lastModifiedDate | 2026/03/04 15:42:08 | * |
| iris.orcid.lastModifiedMillisecond | 1772635328708 | * |
| iris.scopus.extIssued | 2025 | - |
| iris.scopus.extTitle | Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches | - |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.bestoahost | publisher | * |
| iris.unpaywall.bestoaversion | publishedVersion | * |
| iris.unpaywall.doi | 10.1007/s10579-024-09748-6 | * |
| iris.unpaywall.hosttype | publisher | * |
| iris.unpaywall.isoa | true | * |
| iris.unpaywall.journalisindoaj | false | * |
| iris.unpaywall.landingpage | https://doi.org/10.1007/s10579-024-09748-6 | * |
| iris.unpaywall.license | cc-by | * |
| iris.unpaywall.metadataCallLastModified | 05/03/2026 04:55:10 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1772682910810 | - |
| iris.unpaywall.oastatus | hybrid | * |
| iris.unpaywall.pdfurl | https://link.springer.com/content/pdf/10.1007/s10579-024-09748-6.pdf | * |
| isi.authority.ancejournal | LANGUAGE RESOURCES AND EVALUATION###1574-020X | * |
| isi.category | EV | * |
| isi.contributor.affiliation | Ist Linguist Computazionale A Zampolli ILC CNR Ita | - |
| isi.contributor.affiliation | Ist Linguist Computazionale A Zampolli ILC CNR Ita | - |
| isi.contributor.affiliation | Ist Linguist Computazionale A Zampolli ILC CNR Ita | - |
| isi.contributor.affiliation | Ist Linguist Computazionale A Zampolli ILC CNR Ita | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Italy | - |
| isi.contributor.name | Chiara | - |
| isi.contributor.name | Simonetta | - |
| isi.contributor.name | Marta | - |
| isi.contributor.name | Giulia | - |
| isi.contributor.researcherId | KVX-9760-2024 | - |
| isi.contributor.researcherId | B-8000-2015 | - |
| isi.contributor.researcherId | FVK-1943-2022 | - |
| isi.contributor.researcherId | AAY-3932-2020 | - |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | - | |
| isi.contributor.surname | Alzetta | - |
| isi.contributor.surname | Montemagni | - |
| isi.contributor.surname | Sartor | - |
| isi.contributor.surname | Venturi | - |
| isi.date.issued | 2025 | * |
| isi.description.abstracteng | The paper presents ParlaMint-It, a new treebank of Italian parliamentary debates, linguistically annotated based on the Universal Dependencies (UD) framework. The resource comprises 20,460 tokens and represents a hybrid language variety that is underrepresented in the UD initiative. ParlaMint-It results from a manual revision process that relies on a semi-automatic methodology able to identify sentences that are most likely to contain inconsistencies and recurrent error patterns generated by the automatic annotation. Such a method made the revision process faster and more efficient than revising the entire treebank. In addition, it allowed the identification and correction of annotation errors resulting from linguistic constructions inconsistently represented in UD treebanks and from characteristics specific to parliamentary speeches. Hence, the treebank is deemed as an 18-karat resource, since, although not fully manually revised, it is a valuable resource for researchers working on Italian language processing tasks. | * |
| isi.description.allpeopleoriginal | Alzetta, C; Montemagni, S; Sartor, M; Venturi, G; | * |
| isi.document.sourcetype | WOS.SCI | * |
| isi.document.type | Article | * |
| isi.document.types | Article | * |
| isi.identifier.doi | 10.1007/s10579-024-09748-6 | * |
| isi.identifier.eissn | 1574-0218 | * |
| isi.identifier.isi | WOS:001263433400001 | * |
| isi.journal.journaltitle | LANGUAGE RESOURCES AND EVALUATION | * |
| isi.journal.journaltitleabbrev | LANG RESOUR EVAL | * |
| isi.language.original | English | * |
| isi.publisher.place | VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS | * |
| isi.relation.firstpage | 1659 | * |
| isi.relation.issue | 2 | * |
| isi.relation.lastpage | 1683 | * |
| isi.relation.volume | 59 | * |
| isi.title | Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches | * |
| scopus.authority.ancejournal | LANGUAGE RESOURCES AND EVALUATION###1574-020X | * |
| scopus.category | 1203 | * |
| scopus.category | 3304 | * |
| scopus.category | 3310 | * |
| scopus.category | 3309 | * |
| scopus.contributor.affiliation | (ILC-CNR) ItaliaNLP Lab | - |
| scopus.contributor.affiliation | (ILC-CNR) ItaliaNLP Lab | - |
| scopus.contributor.affiliation | (ILC-CNR) ItaliaNLP Lab | - |
| scopus.contributor.affiliation | (ILC-CNR) ItaliaNLP Lab | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.auid | 57192938832 | - |
| scopus.contributor.auid | 15056781100 | - |
| scopus.contributor.auid | 59207233400 | - |
| scopus.contributor.auid | 27568199800 | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.name | Chiara | - |
| scopus.contributor.name | Simonetta | - |
| scopus.contributor.name | Marta | - |
| scopus.contributor.name | Giulia | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale “A. Zampolli”; | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale “A. Zampolli”; | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale “A. Zampolli”; | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale “A. Zampolli”; | - |
| scopus.contributor.surname | Alzetta | - |
| scopus.contributor.surname | Montemagni | - |
| scopus.contributor.surname | Sartor | - |
| scopus.contributor.surname | Venturi | - |
| scopus.date.issued | 2025 | * |
| scopus.description.abstracteng | The paper presents ParlaMint-It, a new treebank of Italian parliamentary debates, linguistically annotated based on the Universal Dependencies (UD) framework. The resource comprises 20,460 tokens and represents a hybrid language variety that is underrepresented in the UD initiative. ParlaMint-It results from a manual revision process that relies on a semi-automatic methodology able to identify sentences that are most likely to contain inconsistencies and recurrent error patterns generated by the automatic annotation. Such a method made the revision process faster and more efficient than revising the entire treebank. In addition, it allowed the identification and correction of annotation errors resulting from linguistic constructions inconsistently represented in UD treebanks and from characteristics specific to parliamentary speeches. Hence, the treebank is deemed as an 18-karat resource, since, although not fully manually revised, it is a valuable resource for researchers working on Italian language processing tasks. | * |
| scopus.description.allpeopleoriginal | Alzetta C.; Montemagni S.; Sartor M.; Venturi G. | * |
| scopus.differences | scopus.relation.lastpage | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.relation.firstpage | * |
| scopus.differences | scopus.description.allpeopleoriginal | * |
| scopus.differences | scopus.description.abstracteng | * |
| scopus.differences | scopus.relation.issue | * |
| scopus.differences | scopus.date.issued | * |
| scopus.differences | scopus.relation.volume | * |
| scopus.document.type | ar | * |
| scopus.document.types | ar | * |
| scopus.funding.funders | 501100007514 - Università di Pisa; | * |
| scopus.identifier.doi | 10.1007/s10579-024-09748-6 | * |
| scopus.identifier.eissn | 1574-0218 | * |
| scopus.identifier.pui | 2030495515 | * |
| scopus.identifier.scopus | 2-s2.0-85197693530 | * |
| scopus.journal.sourceid | 145663 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | Springer Science and Business Media B.V. | * |
| scopus.relation.firstpage | 1659 | * |
| scopus.relation.issue | 2 | * |
| scopus.relation.lastpage | 1683 | * |
| scopus.relation.volume | 59 | * |
| scopus.subject.keywords | Annotation revision; Italian parliamentary debates; Linguistic annotation; Universal dependencies treebanks; | * |
| scopus.title | Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches | * |
| scopus.titleeng | Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches | * |
| File | Dimensione | Formato | |
|---|---|---|---|
|
pubblicato-s10579-024-09748-6.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
876.11 kB
Formato
Adobe PDF
|
876.11 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


