The paper presents ParlaMint-It, a new treebank of Italian parliamentary debates, linguistically annotated based on the Universal Dependencies (UD) framework. The resource comprises 20,460 tokens and represents a hybrid language variety that is underrepresented in the UD initiative. ParlaMint-It results from a manual revision process that relies on a semi-automatic methodology able to identify sentences that are most likely to contain inconsistencies and recurrent error patterns generated by the automatic annotation. Such a method made the revision process faster and more efficient than revising the entire treebank. In addition, it allowed the identification and correction of annotation errors resulting from linguistic constructions inconsis- tently represented in UD treebanks and from characteristics specific to parliamentary speeches. Hence, the treebank is deemed as an 18-karat resource, since, although not fully manually revised, it is a valuable resource for researchers working on Italian language processing tasks.

Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches

Chiara Alzetta
;
Simonetta Montemagni;Giulia Venturi
2024

Abstract

The paper presents ParlaMint-It, a new treebank of Italian parliamentary debates, linguistically annotated based on the Universal Dependencies (UD) framework. The resource comprises 20,460 tokens and represents a hybrid language variety that is underrepresented in the UD initiative. ParlaMint-It results from a manual revision process that relies on a semi-automatic methodology able to identify sentences that are most likely to contain inconsistencies and recurrent error patterns generated by the automatic annotation. Such a method made the revision process faster and more efficient than revising the entire treebank. In addition, it allowed the identification and correction of annotation errors resulting from linguistic constructions inconsis- tently represented in UD treebanks and from characteristics specific to parliamentary speeches. Hence, the treebank is deemed as an 18-karat resource, since, although not fully manually revised, it is a valuable resource for researchers working on Italian language processing tasks.
Campo DC Valore Lingua
dc.authority.ancejournal LANGUAGE RESOURCES AND EVALUATION en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Chiara Alzetta en
dc.authority.people Simonetta Montemagni en
dc.authority.people Marta Sartor en
dc.authority.people Giulia Venturi en
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2024/07/22 15:41:15 -
dc.date.available 2024/07/22 15:41:15 -
dc.date.firstsubmission 2024/07/11 15:46:12 *
dc.date.issued 2024 -
dc.date.submission 2025/03/06 15:21:46 *
dc.description.abstracteng The paper presents ParlaMint-It, a new treebank of Italian parliamentary debates, linguistically annotated based on the Universal Dependencies (UD) framework. The resource comprises 20,460 tokens and represents a hybrid language variety that is underrepresented in the UD initiative. ParlaMint-It results from a manual revision process that relies on a semi-automatic methodology able to identify sentences that are most likely to contain inconsistencies and recurrent error patterns generated by the automatic annotation. Such a method made the revision process faster and more efficient than revising the entire treebank. In addition, it allowed the identification and correction of annotation errors resulting from linguistic constructions inconsis- tently represented in UD treebanks and from characteristics specific to parliamentary speeches. Hence, the treebank is deemed as an 18-karat resource, since, although not fully manually revised, it is a valuable resource for researchers working on Italian language processing tasks. -
dc.description.allpeople Alzetta, Chiara; Montemagni, Simonetta; Sartor, Marta; Venturi, Giulia -
dc.description.allpeopleoriginal Chiara Alzetta, Simonetta Montemagni, Marta Sartor, Giulia Venturi en
dc.description.fulltext open en
dc.description.international no en
dc.description.numberofauthors 4 -
dc.identifier.doi 10.1007/s10579-024-09748-6 en
dc.identifier.isi WOS:001263433400001 en
dc.identifier.scopus 2-s2.0-85197693530 en
dc.identifier.source manual *
dc.identifier.uri https://hdl.handle.net/20.500.14243/484441 -
dc.identifier.url https://link.springer.com/content/pdf/10.1007/s10579-024-09748-6.pdf en
dc.language.iso eng en
dc.relation.medium ELETTRONICO en
dc.relation.numberofpages 25 en
dc.subject.keywordseng Universal dependencies treebanks, Annotation revision, Italian parliamentary debates, Linguistic annotation -
dc.subject.singlekeyword Universal dependencies treebanks *
dc.subject.singlekeyword Annotation revision *
dc.subject.singlekeyword Italian parliamentary debates *
dc.subject.singlekeyword Linguistic annotation *
dc.title Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.impactfactor si en
dc.type.miur 262 -
dc.type.referee Esperti anonimi en
iris.isi.extIssued 2025 -
iris.isi.extTitle Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches -
iris.mediafilter.data 2025/04/03 04:14:46 *
iris.orcid.lastModifiedDate 2026/03/04 15:42:08 *
iris.orcid.lastModifiedMillisecond 1772635328708 *
iris.scopus.extIssued 2025 -
iris.scopus.extTitle Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.bestoahost publisher *
iris.unpaywall.bestoaversion publishedVersion *
iris.unpaywall.doi 10.1007/s10579-024-09748-6 *
iris.unpaywall.hosttype publisher *
iris.unpaywall.isoa true *
iris.unpaywall.journalisindoaj false *
iris.unpaywall.landingpage https://doi.org/10.1007/s10579-024-09748-6 *
iris.unpaywall.license cc-by *
iris.unpaywall.metadataCallLastModified 05/03/2026 04:55:10 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1772682910810 -
iris.unpaywall.oastatus hybrid *
iris.unpaywall.pdfurl https://link.springer.com/content/pdf/10.1007/s10579-024-09748-6.pdf *
isi.authority.ancejournal LANGUAGE RESOURCES AND EVALUATION###1574-020X *
isi.category EV *
isi.contributor.affiliation Ist Linguist Computazionale A Zampolli ILC CNR Ita -
isi.contributor.affiliation Ist Linguist Computazionale A Zampolli ILC CNR Ita -
isi.contributor.affiliation Ist Linguist Computazionale A Zampolli ILC CNR Ita -
isi.contributor.affiliation Ist Linguist Computazionale A Zampolli ILC CNR Ita -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.name Chiara -
isi.contributor.name Simonetta -
isi.contributor.name Marta -
isi.contributor.name Giulia -
isi.contributor.researcherId KVX-9760-2024 -
isi.contributor.researcherId B-8000-2015 -
isi.contributor.researcherId FVK-1943-2022 -
isi.contributor.researcherId AAY-3932-2020 -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation -
isi.contributor.surname Alzetta -
isi.contributor.surname Montemagni -
isi.contributor.surname Sartor -
isi.contributor.surname Venturi -
isi.date.issued 2025 *
isi.description.abstracteng The paper presents ParlaMint-It, a new treebank of Italian parliamentary debates, linguistically annotated based on the Universal Dependencies (UD) framework. The resource comprises 20,460 tokens and represents a hybrid language variety that is underrepresented in the UD initiative. ParlaMint-It results from a manual revision process that relies on a semi-automatic methodology able to identify sentences that are most likely to contain inconsistencies and recurrent error patterns generated by the automatic annotation. Such a method made the revision process faster and more efficient than revising the entire treebank. In addition, it allowed the identification and correction of annotation errors resulting from linguistic constructions inconsistently represented in UD treebanks and from characteristics specific to parliamentary speeches. Hence, the treebank is deemed as an 18-karat resource, since, although not fully manually revised, it is a valuable resource for researchers working on Italian language processing tasks. *
isi.description.allpeopleoriginal Alzetta, C; Montemagni, S; Sartor, M; Venturi, G; *
isi.document.sourcetype WOS.SCI *
isi.document.type Article *
isi.document.types Article *
isi.identifier.doi 10.1007/s10579-024-09748-6 *
isi.identifier.eissn 1574-0218 *
isi.identifier.isi WOS:001263433400001 *
isi.journal.journaltitle LANGUAGE RESOURCES AND EVALUATION *
isi.journal.journaltitleabbrev LANG RESOUR EVAL *
isi.language.original English *
isi.publisher.place VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS *
isi.relation.firstpage 1659 *
isi.relation.issue 2 *
isi.relation.lastpage 1683 *
isi.relation.volume 59 *
isi.title Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches *
scopus.authority.ancejournal LANGUAGE RESOURCES AND EVALUATION###1574-020X *
scopus.category 1203 *
scopus.category 3304 *
scopus.category 3310 *
scopus.category 3309 *
scopus.contributor.affiliation (ILC-CNR) ItaliaNLP Lab -
scopus.contributor.affiliation (ILC-CNR) ItaliaNLP Lab -
scopus.contributor.affiliation (ILC-CNR) ItaliaNLP Lab -
scopus.contributor.affiliation (ILC-CNR) ItaliaNLP Lab -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.auid 57192938832 -
scopus.contributor.auid 15056781100 -
scopus.contributor.auid 59207233400 -
scopus.contributor.auid 27568199800 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Chiara -
scopus.contributor.name Simonetta -
scopus.contributor.name Marta -
scopus.contributor.name Giulia -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale “A. Zampolli”; -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale “A. Zampolli”; -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale “A. Zampolli”; -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale “A. Zampolli”; -
scopus.contributor.surname Alzetta -
scopus.contributor.surname Montemagni -
scopus.contributor.surname Sartor -
scopus.contributor.surname Venturi -
scopus.date.issued 2025 *
scopus.description.abstracteng The paper presents ParlaMint-It, a new treebank of Italian parliamentary debates, linguistically annotated based on the Universal Dependencies (UD) framework. The resource comprises 20,460 tokens and represents a hybrid language variety that is underrepresented in the UD initiative. ParlaMint-It results from a manual revision process that relies on a semi-automatic methodology able to identify sentences that are most likely to contain inconsistencies and recurrent error patterns generated by the automatic annotation. Such a method made the revision process faster and more efficient than revising the entire treebank. In addition, it allowed the identification and correction of annotation errors resulting from linguistic constructions inconsistently represented in UD treebanks and from characteristics specific to parliamentary speeches. Hence, the treebank is deemed as an 18-karat resource, since, although not fully manually revised, it is a valuable resource for researchers working on Italian language processing tasks. *
scopus.description.allpeopleoriginal Alzetta C.; Montemagni S.; Sartor M.; Venturi G. *
scopus.differences scopus.relation.lastpage *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.relation.firstpage *
scopus.differences scopus.description.allpeopleoriginal *
scopus.differences scopus.description.abstracteng *
scopus.differences scopus.relation.issue *
scopus.differences scopus.date.issued *
scopus.differences scopus.relation.volume *
scopus.document.type ar *
scopus.document.types ar *
scopus.funding.funders 501100007514 - Università di Pisa; *
scopus.identifier.doi 10.1007/s10579-024-09748-6 *
scopus.identifier.eissn 1574-0218 *
scopus.identifier.pui 2030495515 *
scopus.identifier.scopus 2-s2.0-85197693530 *
scopus.journal.sourceid 145663 *
scopus.language.iso eng *
scopus.publisher.name Springer Science and Business Media B.V. *
scopus.relation.firstpage 1659 *
scopus.relation.issue 2 *
scopus.relation.lastpage 1683 *
scopus.relation.volume 59 *
scopus.subject.keywords Annotation revision; Italian parliamentary debates; Linguistic annotation; Universal dependencies treebanks; *
scopus.title Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches *
scopus.titleeng Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches *
File in questo prodotto:
File Dimensione Formato  
pubblicato-s10579-024-09748-6.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 876.11 kB
Formato Adobe PDF
876.11 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/484441
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
social impact