The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed.

ParlaMint II: Advancing Comparable Parliamentary Corpora Across Europe

Agnoloni, Tommaso;Bartolini, Roberto;
2024

Abstract

The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed.
Campo DC Valore Lingua
dc.authority.ancejournal LANGUAGE RESOURCES AND EVALUATION en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.orgunit Istituto di Informatica Giuridica e Sistemi Giudiziari - IGSG en
dc.authority.people Erjavec, Tomaž en
dc.authority.people Kopp, Matyáš en
dc.authority.people Ljubešić, Nikola en
dc.authority.people Kuzman, Taja en
dc.authority.people Rayson, Paul en
dc.authority.people Osenova, Petya en
dc.authority.people Ogrodniczuk, Maciej en
dc.authority.people Çöltekin, Çağrı en
dc.authority.people Koržinek, Danijel en
dc.authority.people Meden, Katja en
dc.authority.people Skubic, Jure en
dc.authority.people Rupnik, Peter en
dc.authority.people Agnoloni, Tommaso en
dc.authority.people Aires, José en
dc.authority.people Barkarson, Starkaður en
dc.authority.people Bartolini, Roberto en
dc.authority.people Bel, Núria en
dc.authority.people Pérez, María Calzada en
dc.authority.people Darģis, Roberts en
dc.authority.people Diwersy, Sascha en
dc.authority.people Gavriilidou, Maria en
dc.authority.people Heusden, Ruben van en
dc.authority.people Iruskieta, Mikel en
dc.authority.people Kahusk, Neeme en
dc.authority.people Kryvenko, Anna en
dc.authority.people Ligeti-Nagy, Noémi en
dc.authority.people Magariños, Carmen en
dc.authority.people Mölder, Martin en
dc.authority.people Navarretta, Costanza en
dc.authority.people Simov, Kiril en
dc.authority.people Tungland, Lars Magne en
dc.authority.people Tuominen, Jouni en
dc.authority.people Vidler, John en
dc.authority.people Vladu, Adina Ioana en
dc.authority.people Wissik, Tanja en
dc.authority.people Yrjänäinen, Väinö en
dc.authority.people Fišer, Darja en
dc.authority.project CLARIN en
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di Informatica Giuridica e Sistemi Giudiziari - IGSG *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.appartenenza.mi 1108 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2024/08/27 10:49:32 -
dc.date.available 2024/08/27 10:49:32 -
dc.date.firstsubmission 2024/07/05 18:36:55 *
dc.date.issued 2024 -
dc.date.submission 2025/03/07 15:04:33 *
dc.description.abstracteng The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed. -
dc.description.allpeople Erjavec, Tomaž; Kopp, Matyáš; Ljubešić, Nikola; Kuzman, Taja; Rayson, Paul; Osenova, Petya; Ogrodniczuk, Maciej; Çöltekin, Çağrı; Koržinek, Danijel; Meden, Katja; Skubic, Jure; Rupnik, Peter; Agnoloni, Tommaso; Aires, José; Barkarson, Starkaður; Bartolini, Roberto; Bel, Núria; Pérez, María Calzada; Darģis, Roberts; Diwersy, Sascha; Gavriilidou, Maria; Heusden, Ruben van; Iruskieta, Mikel; Kahusk, Neeme; Kryvenko, Anna; Ligeti-Nagy, Noémi; Magariños, Carmen; Mölder, Martin; Navarretta, Costanza; Simov, Kiril; Tungland, Lars Magne; Tuominen, Jouni; Vidler, John; Vladu, Adina Ioana; Wissik, Tanja; Yrjänäinen, Väinö; Fišer, Darja -
dc.description.allpeopleoriginal Erjavec, Tomaž; Kopp, Matyáš; Ljubešić, Nikola; Kuzman, Taja; Rayson, Paul; Osenova, Petya; Ogrodniczuk, Maciej; Çöltekin, Çağrı; Koržinek, Danijel; Meden, Katja; Skubic, Jure; Rupnik, Peter; Agnoloni, Tommaso; Aires, José; Barkarson, Starkaður; Bartolini, Roberto; Bel, Núria; Pérez, María Calzada; Darģis, Roberts; Diwersy, Sascha; Gavriilidou, Maria; Heusden, Ruben van; Iruskieta, Mikel; Kahusk, Neeme; Kryvenko, Anna; Ligeti-Nagy, Noémi; Magariños, Carmen; Mölder, Martin; Navarretta, Costanza; Simov, Kiril; Tungland, Lars Magne; Tuominen, Jouni; Vidler, John; Vladu, Adina Ioana; Wissik, Tanja; Yrjänäinen, Väinö; Fišer, Darja en
dc.description.fulltext restricted en
dc.description.numberofauthors 37 -
dc.identifier.doi 10.21203/rs.3.rs-4176128/v1 en
dc.identifier.isi WOS:001385018200001 -
dc.identifier.scopus 2-s2.0-85213520565 en
dc.identifier.source crossref *
dc.identifier.uri https://hdl.handle.net/20.500.14243/483041 -
dc.language.iso eng en
dc.relation.medium ELETTRONICO en
dc.relation.projectAcronym - en
dc.relation.projectAwardNumber - en
dc.relation.projectAwardTitle CLARIN en
dc.relation.projectFunderName - en
dc.relation.projectFundingStream - en
dc.subject.keywordseng Parliamentary proceedings -
dc.subject.keywordseng TEI -
dc.subject.keywordseng Comparable corpora -
dc.subject.singlekeyword Parliamentary proceedings *
dc.subject.singlekeyword TEI *
dc.subject.singlekeyword Comparable corpora *
dc.title ParlaMint II: Advancing Comparable Parliamentary Corpora Across Europe en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.impactfactor si en
dc.type.miur 262 -
iris.isi.extIssued 2025 -
iris.isi.extTitle ParlaMint II: advancing comparable parliamentary corpora across Europe -
iris.mediafilter.data 2025/03/23 03:18:27 *
iris.orcid.lastModifiedDate 2025/09/06 01:09:30 *
iris.orcid.lastModifiedMillisecond 1757113770069 *
iris.scopus.extIssued 2025 -
iris.scopus.extTitle ParlaMint II: advancing comparable parliamentary corpora across Europe -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.bestoaversion acceptedVersion *
iris.unpaywall.doi 10.21203/rs.3.rs-4176128/v1 *
iris.unpaywall.isoa true *
iris.unpaywall.landingpage https://doi.org/10.21203/rs.3.rs-4176128/v1 *
iris.unpaywall.license cc-by *
iris.unpaywall.metadataCallLastModified 06/09/2025 04:24:32 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1757125472154 -
iris.unpaywall.oastatus gold *
iris.unpaywall.pdfurl https://www.researchsquare.com/article/rs-4176128/latest.pdf *
isi.authority.ancejournal LANGUAGE RESOURCES AND EVALUATION###1574-020X *
isi.category EV *
isi.contributor.affiliation Slovenian Academy of Sciences & Arts (SASA) -
isi.contributor.affiliation Charles University Prague -
isi.contributor.affiliation Slovenian Academy of Sciences & Arts (SASA) -
isi.contributor.affiliation Slovenian Academy of Sciences & Arts (SASA) -
isi.contributor.affiliation Lancaster University -
isi.contributor.affiliation Bulgarian Academy of Sciences -
isi.contributor.affiliation Polish Academy of Sciences -
isi.contributor.affiliation Eberhard Karls University of Tubingen -
isi.contributor.affiliation Polsko-Japonska Akademia Technik Komputerowych -
isi.contributor.affiliation Slovenian Academy of Sciences & Arts (SASA) -
isi.contributor.affiliation Inst Contemporary Hist -
isi.contributor.affiliation Slovenian Academy of Sciences & Arts (SASA) -
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.affiliation Universidade de Lisboa -
isi.contributor.affiliation Arni Magnusson Inst Iceland Studies -
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.affiliation Pompeu Fabra University -
isi.contributor.affiliation Universitat Jaume I -
isi.contributor.affiliation -
isi.contributor.affiliation Paul Valery Univ Montpellier 3 -
isi.contributor.affiliation Athena Res & Innovat Ctr Informat Commun & Knowled -
isi.contributor.affiliation University of Amsterdam -
isi.contributor.affiliation University of Basque Country -
isi.contributor.affiliation University of Tartu -
isi.contributor.affiliation Inst Contemporary Hist -
isi.contributor.affiliation HUN-REN -
isi.contributor.affiliation Universidade de Santiago de Compostela -
isi.contributor.affiliation University of Tartu -
isi.contributor.affiliation University of Copenhagen -
isi.contributor.affiliation Bulgarian Academy of Sciences -
isi.contributor.affiliation Natl Lib Norway -
isi.contributor.affiliation University of Helsinki -
isi.contributor.affiliation Lancaster University -
isi.contributor.affiliation Universidade de Santiago de Compostela -
isi.contributor.affiliation Austrian Academy of Sciences -
isi.contributor.affiliation Uppsala University -
isi.contributor.affiliation Inst Contemporary Hist -
isi.contributor.country Slovenia -
isi.contributor.country Czech Republic -
isi.contributor.country Slovenia -
isi.contributor.country Slovenia -
isi.contributor.country England -
isi.contributor.country Bulgaria -
isi.contributor.country Poland -
isi.contributor.country Germany -
isi.contributor.country Poland -
isi.contributor.country Slovenia -
isi.contributor.country Slovenia -
isi.contributor.country Slovenia -
isi.contributor.country Italy -
isi.contributor.country Portugal -
isi.contributor.country Iceland -
isi.contributor.country Italy -
isi.contributor.country Spain -
isi.contributor.country Spain -
isi.contributor.country -
isi.contributor.country France -
isi.contributor.country Greece -
isi.contributor.country Netherlands -
isi.contributor.country Spain -
isi.contributor.country Estonia -
isi.contributor.country Slovenia -
isi.contributor.country Hungary -
isi.contributor.country Spain -
isi.contributor.country Estonia -
isi.contributor.country Denmark -
isi.contributor.country Bulgaria -
isi.contributor.country Norway -
isi.contributor.country Finland -
isi.contributor.country England -
isi.contributor.country Spain -
isi.contributor.country Austria -
isi.contributor.country Sweden -
isi.contributor.country Slovenia -
isi.contributor.name Tomaz -
isi.contributor.name Matyas -
isi.contributor.name Nikola -
isi.contributor.name Taja -
isi.contributor.name Paul -
isi.contributor.name Petya -
isi.contributor.name Maciej -
isi.contributor.name Cagri -
isi.contributor.name Danijel -
isi.contributor.name Katja -
isi.contributor.name Jure -
isi.contributor.name Peter -
isi.contributor.name Tommaso -
isi.contributor.name Jose -
isi.contributor.name Starkaour -
isi.contributor.name Roberto -
isi.contributor.name Nuria -
isi.contributor.name Maria Calzada -
isi.contributor.name Roberts -
isi.contributor.name Sascha -
isi.contributor.name Maria -
isi.contributor.name Ruben -
isi.contributor.name Mikel -
isi.contributor.name Neeme -
isi.contributor.name Anna -
isi.contributor.name Noemi -
isi.contributor.name Carmen -
isi.contributor.name Martin -
isi.contributor.name Costanza -
isi.contributor.name Kiril -
isi.contributor.name Lars Magne -
isi.contributor.name Jouni -
isi.contributor.name John -
isi.contributor.name Adina Ioana -
isi.contributor.name Tanja -
isi.contributor.name Vaino -
isi.contributor.name Darja -
isi.contributor.researcherId LBG-9042-2024 -
isi.contributor.researcherId M-6466-2017 -
isi.contributor.researcherId DWU-6583-2022 -
isi.contributor.researcherId LJK-2393-2024 -
isi.contributor.researcherId HKW-7858-2023 -
isi.contributor.researcherId P-2523-2019 -
isi.contributor.researcherId KDI-1218-2024 -
isi.contributor.researcherId EPM-9793-2022 -
isi.contributor.researcherId K-5168-2014 -
isi.contributor.researcherId KPP-7991-2024 -
isi.contributor.researcherId LZY-6766-2025 -
isi.contributor.researcherId FVT-4618-2022 -
isi.contributor.researcherId LZI-8973-2025 -
isi.contributor.researcherId ENK-8183-2022 -
isi.contributor.researcherId ENF-4114-2022 -
isi.contributor.researcherId ELG-2280-2022 -
isi.contributor.researcherId K-4604-2014 -
isi.contributor.researcherId DMM-7348-2022 -
isi.contributor.researcherId CLY-6227-2022 -
isi.contributor.researcherId CNU-2174-2022 -
isi.contributor.researcherId EZW-4700-2022 -
isi.contributor.researcherId EAF-3708-2022 -
isi.contributor.researcherId IAM-1255-2023 -
isi.contributor.researcherId LZV-8338-2025 -
isi.contributor.researcherId LZT-5463-2025 -
isi.contributor.researcherId ISU-0196-2023 -
isi.contributor.researcherId LZM-8785-2025 -
isi.contributor.researcherId FLV-0056-2022 -
isi.contributor.researcherId FZG-7018-2022 -
isi.contributor.researcherId AAB-3393-2019 -
isi.contributor.researcherId GGC-1122-2022 -
isi.contributor.researcherId E-9330-2019 -
isi.contributor.researcherId GDN-6366-2022 -
isi.contributor.researcherId MVV-9073-2025 -
isi.contributor.researcherId GJH-3353-2022 -
isi.contributor.researcherId LZQ-4144-2025 -
isi.contributor.researcherId FYR-1473-2022 -
isi.contributor.subaffiliation Dept Knowledge Technol -
isi.contributor.subaffiliation Inst Formal & Appl Linguist -
isi.contributor.subaffiliation Dept Knowledge Technol -
isi.contributor.subaffiliation Dept Knowledge Technol -
isi.contributor.subaffiliation UCREL NLP Res Grp -
isi.contributor.subaffiliation Inst Informat & Commun Technol -
isi.contributor.subaffiliation Inst Comp Sci -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation Dept Multimedia -
isi.contributor.subaffiliation Dept Knowledge Technol -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation Dept Knowledge Technol -
isi.contributor.subaffiliation Inst Legal Informat & Judicial Syst -
isi.contributor.subaffiliation Ctr Linguist -
isi.contributor.subaffiliation Dept Iceland -
isi.contributor.subaffiliation Ist Linguist Computazionale -
isi.contributor.subaffiliation Dept Translat & Language Sci -
isi.contributor.subaffiliation Dept Traducc & Comunicac -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation Praxiling UMR CNRS 5267 -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation Informat Retrieval Lab -
isi.contributor.subaffiliation HiTZ Basque Ctr Language Techonol -
isi.contributor.subaffiliation Inst Comp Sci -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation Language Technol Res Grp -
isi.contributor.subaffiliation Galician Language Inst -
isi.contributor.subaffiliation Johan Skytte Inst Polit Studies -
isi.contributor.subaffiliation Dept Nord Studies & Linguist -
isi.contributor.subaffiliation Inst Informat & Commun Technol -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation Helsinki Inst Social Sci & Humanities -
isi.contributor.subaffiliation UCREL NLP Res Grp -
isi.contributor.subaffiliation Galician Language Inst -
isi.contributor.subaffiliation Austrian Ctr Digital Humanities & Cultural Heritag -
isi.contributor.subaffiliation Dept Stat -
isi.contributor.subaffiliation -
isi.contributor.surname Erjavec -
isi.contributor.surname Kopp -
isi.contributor.surname Ljubesic -
isi.contributor.surname Kuzman -
isi.contributor.surname Rayson -
isi.contributor.surname Osenova -
isi.contributor.surname Ogrodniczuk -
isi.contributor.surname Coeltekin -
isi.contributor.surname Korzinek -
isi.contributor.surname Meden -
isi.contributor.surname Skubic -
isi.contributor.surname Rupnik -
isi.contributor.surname Agnoloni -
isi.contributor.surname Aires -
isi.contributor.surname Barkarson -
isi.contributor.surname Bartolini -
isi.contributor.surname Bel -
isi.contributor.surname Perez -
isi.contributor.surname Dargis -
isi.contributor.surname Diwersy -
isi.contributor.surname Gavriilidou -
isi.contributor.surname van Heusden -
isi.contributor.surname Iruskieta -
isi.contributor.surname Kahusk -
isi.contributor.surname Kryvenko -
isi.contributor.surname Ligeti-Nagy -
isi.contributor.surname Magarinos -
isi.contributor.surname Moelder -
isi.contributor.surname Navarretta -
isi.contributor.surname Simov -
isi.contributor.surname Tungland -
isi.contributor.surname Tuominen -
isi.contributor.surname Vidler -
isi.contributor.surname Vladu -
isi.contributor.surname Wissik -
isi.contributor.surname Yrjanainen -
isi.contributor.surname Fiser -
isi.date.issued 2025 *
isi.description.abstracteng The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed. *
isi.description.allpeopleoriginal Erjavec, T; Kopp, M; Ljubesic, N; Kuzman, T; Rayson, P; Osenova, P; Ogrodniczuk, M; Cöltekin, C; Korzinek, D; Meden, K; Skubic, J; Rupnik, P; Agnoloni, T; Aires, J; Barkarson, S; Bartolini, R; Bel, N; Pérez, MC; Dargis, R; Diwersy, S; Gavriilidou, M; van Heusden, R; Iruskieta, M; Kahusk, N; Kryvenko, A; Ligeti-Nagy, N; Magariños, C; Mölder, M; Navarretta, C; Simov, K; Tungland, LM; Tuominen, J; Vidler, J; Vladu, AI; Wissik, T; Yrjänäinen, V; Fiser, D; *
isi.document.sourcetype WOS.SCI *
isi.document.type Article *
isi.document.types Article *
isi.identifier.doi 10.1007/s10579-024-09798-w *
isi.identifier.eissn 1574-0218 *
isi.identifier.isi WOS:001385018200001 *
isi.journal.journaltitle LANGUAGE RESOURCES AND EVALUATION *
isi.journal.journaltitleabbrev LANG RESOUR EVAL *
isi.language.original English *
isi.publisher.place VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS *
isi.relation.firstpage 2071 *
isi.relation.issue 3 *
isi.relation.lastpage 2102 *
isi.relation.volume 59 *
isi.title ParlaMint II: advancing comparable parliamentary corpora across Europe *
scopus.authority.ancejournal LANGUAGE RESOURCES AND EVALUATION###1574-020X *
scopus.category 1203 *
scopus.category 3304 *
scopus.category 3310 *
scopus.category 3309 *
scopus.contributor.affiliation Jožef Stefan Institute -
scopus.contributor.affiliation Charles University -
scopus.contributor.affiliation Institute of Contemporary History -
scopus.contributor.affiliation Jožef Stefan Institute -
scopus.contributor.affiliation Lancaster University -
scopus.contributor.affiliation Bulgarian Academy of Sciences -
scopus.contributor.affiliation Polish Academy of Sciences -
scopus.contributor.affiliation University of Tübingen -
scopus.contributor.affiliation Polish-Japanese Academy of Information Technology -
scopus.contributor.affiliation Institute of Contemporary History -
scopus.contributor.affiliation Institute of Contemporary History -
scopus.contributor.affiliation Jožef Stefan Institute -
scopus.contributor.affiliation CNR -
scopus.contributor.affiliation University of Lisbon -
scopus.contributor.affiliation The Árni Magnússon Institute for Icelandic Studies -
scopus.contributor.affiliation CNR -
scopus.contributor.affiliation Pompeu Fabra University -
scopus.contributor.affiliation Universitat Jaume I -
scopus.contributor.affiliation IMCS at the University of Latvia -
scopus.contributor.affiliation Paul Valéry University Montpellier 3 -
scopus.contributor.affiliation Athena Research & Innovation Center in Information Communication & Knowledge Technologies -
scopus.contributor.affiliation University of Amsterdam -
scopus.contributor.affiliation University of the Basque Country (UPV/EHU) -
scopus.contributor.affiliation University of Tartu -
scopus.contributor.affiliation NISS -
scopus.contributor.affiliation HUN-REN Hungarian Research Centre for Linguistics -
scopus.contributor.affiliation University of Santiago de Compostela -
scopus.contributor.affiliation University of Tartu -
scopus.contributor.affiliation University of Copenhagen -
scopus.contributor.affiliation Bulgarian Academy of Sciences -
scopus.contributor.affiliation National Library of Norway -
scopus.contributor.affiliation University of Helsinki -
scopus.contributor.affiliation Lancaster University -
scopus.contributor.affiliation University of Santiago de Compostela -
scopus.contributor.affiliation Austrian Academy of Sciences -
scopus.contributor.affiliation Uppsala University -
scopus.contributor.affiliation Institute of Contemporary History -
scopus.contributor.afid 60023955 -
scopus.contributor.afid 60016605 -
scopus.contributor.afid 129102120 -
scopus.contributor.afid 60023955 -
scopus.contributor.afid 60023643 -
scopus.contributor.afid 60109565 -
scopus.contributor.afid 60010993 -
scopus.contributor.afid 60017246 -
scopus.contributor.afid 60027964 -
scopus.contributor.afid 129102120 -
scopus.contributor.afid 129102120 -
scopus.contributor.afid 60023955 -
scopus.contributor.afid 60021199 -
scopus.contributor.afid 60041243 -
scopus.contributor.afid 60071113 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60032942 -
scopus.contributor.afid 60002676 -
scopus.contributor.afid 60071043 -
scopus.contributor.afid 60009278 -
scopus.contributor.afid 60104388 -
scopus.contributor.afid 60002483 -
scopus.contributor.afid 60027856 -
scopus.contributor.afid 60068856 -
scopus.contributor.afid 131271276 -
scopus.contributor.afid 60020907 -
scopus.contributor.afid 60028419 -
scopus.contributor.afid 60068856 -
scopus.contributor.afid 60030840 -
scopus.contributor.afid 60109565 -
scopus.contributor.afid 125128547 -
scopus.contributor.afid 60002952 -
scopus.contributor.afid 60023643 -
scopus.contributor.afid 60028419 -
scopus.contributor.afid 60003156 -
scopus.contributor.afid 60003858 -
scopus.contributor.afid 129102120 -
scopus.contributor.auid 56151465000 -
scopus.contributor.auid 57195428424 -
scopus.contributor.auid 56829162700 -
scopus.contributor.auid 57197735572 -
scopus.contributor.auid 8652019200 -
scopus.contributor.auid 8933829900 -
scopus.contributor.auid 54880531200 -
scopus.contributor.auid 56548968900 -
scopus.contributor.auid 15042645500 -
scopus.contributor.auid 57222076348 -
scopus.contributor.auid 58046200700 -
scopus.contributor.auid 59454607400 -
scopus.contributor.auid 57199421725 -
scopus.contributor.auid 59134232000 -
scopus.contributor.auid 57205404526 -
scopus.contributor.auid 22333654100 -
scopus.contributor.auid 55369471300 -
scopus.contributor.auid 6506330957 -
scopus.contributor.auid 56982845800 -
scopus.contributor.auid 57194974345 -
scopus.contributor.auid 57219589249 -
scopus.contributor.auid 57211109229 -
scopus.contributor.auid 27667722000 -
scopus.contributor.auid 6505791173 -
scopus.contributor.auid 57218937567 -
scopus.contributor.auid 57205401241 -
scopus.contributor.auid 57140569700 -
scopus.contributor.auid 55258287100 -
scopus.contributor.auid 14058464000 -
scopus.contributor.auid 8835805500 -
scopus.contributor.auid 59134232100 -
scopus.contributor.auid 24386059900 -
scopus.contributor.auid 57015236100 -
scopus.contributor.auid 57226647109 -
scopus.contributor.auid 55842078100 -
scopus.contributor.auid 57579634800 -
scopus.contributor.auid 25121446200 -
scopus.contributor.country Slovenia -
scopus.contributor.country Czech Republic -
scopus.contributor.country Slovenia -
scopus.contributor.country Slovenia -
scopus.contributor.country United Kingdom -
scopus.contributor.country Bulgaria -
scopus.contributor.country Poland -
scopus.contributor.country Germany -
scopus.contributor.country Poland -
scopus.contributor.country Slovenia -
scopus.contributor.country Slovenia -
scopus.contributor.country Slovenia -
scopus.contributor.country Italy -
scopus.contributor.country Portugal -
scopus.contributor.country Iceland -
scopus.contributor.country Italy -
scopus.contributor.country Spain -
scopus.contributor.country Spain -
scopus.contributor.country Latvia -
scopus.contributor.country France -
scopus.contributor.country Greece -
scopus.contributor.country Netherlands -
scopus.contributor.country Spain -
scopus.contributor.country Estonia -
scopus.contributor.country Ukraine -
scopus.contributor.country Hungary -
scopus.contributor.country Spain -
scopus.contributor.country Estonia -
scopus.contributor.country Denmark -
scopus.contributor.country Bulgaria -
scopus.contributor.country Norway -
scopus.contributor.country Finland -
scopus.contributor.country United Kingdom -
scopus.contributor.country Spain -
scopus.contributor.country Austria -
scopus.contributor.country Sweden -
scopus.contributor.country Slovenia -
scopus.contributor.dptid -
scopus.contributor.dptid 103854668 -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid 104417131 -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid 125212332 -
scopus.contributor.dptid 131805289 -
scopus.contributor.dptid 104786102 -
scopus.contributor.dptid -
scopus.contributor.dptid 108255548 -
scopus.contributor.dptid 105729190 -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid 129214040 -
scopus.contributor.dptid -
scopus.contributor.dptid 113098505 -
scopus.contributor.dptid -
scopus.contributor.dptid 132232694 -
scopus.contributor.dptid 131272211 -
scopus.contributor.dptid -
scopus.contributor.dptid 104565667 -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid 128336596 -
scopus.contributor.dptid -
scopus.contributor.dptid 131272211 -
scopus.contributor.dptid 127562805 -
scopus.contributor.dptid 103243768 -
scopus.contributor.dptid -
scopus.contributor.name Tomaž -
scopus.contributor.name Matyáš -
scopus.contributor.name Nikola -
scopus.contributor.name Taja -
scopus.contributor.name Paul -
scopus.contributor.name Petya -
scopus.contributor.name Maciej -
scopus.contributor.name Çağrı -
scopus.contributor.name Danijel -
scopus.contributor.name Katja -
scopus.contributor.name Jure -
scopus.contributor.name Peter -
scopus.contributor.name Tommaso -
scopus.contributor.name José -
scopus.contributor.name Starkaður -
scopus.contributor.name Roberto -
scopus.contributor.name Núria -
scopus.contributor.name María -
scopus.contributor.name Roberts -
scopus.contributor.name Sascha -
scopus.contributor.name Maria -
scopus.contributor.name Ruben -
scopus.contributor.name Mikel -
scopus.contributor.name Neeme -
scopus.contributor.name Anna -
scopus.contributor.name Noémi -
scopus.contributor.name Carmen -
scopus.contributor.name Martin -
scopus.contributor.name Costanza -
scopus.contributor.name Kiril -
scopus.contributor.name Lars Magne -
scopus.contributor.name Jouni -
scopus.contributor.name John -
scopus.contributor.name Adina Ioana -
scopus.contributor.name Tanja -
scopus.contributor.name Väinö -
scopus.contributor.name Darja -
scopus.contributor.subaffiliation Department of Knowledge Technologies; -
scopus.contributor.subaffiliation Institute of Formal and Applied Linguistics;Faculty of Mathematics and Physics; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Department of Knowledge Technologies; -
scopus.contributor.subaffiliation UCREL NLP research group; -
scopus.contributor.subaffiliation Institute of Information and Communication Technologies; -
scopus.contributor.subaffiliation Institute of Computer Science; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Department of Multimedia; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Department of Knowledge Technologies; -
scopus.contributor.subaffiliation Institute of Legal Informatics and Judicial Systems; -
scopus.contributor.subaffiliation School of Arts and Humanities - Centre of Linguistics; -
scopus.contributor.subaffiliation Department of Icelandic; -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale; -
scopus.contributor.subaffiliation Department of Translation and Language Sciences; -
scopus.contributor.subaffiliation Departamento de Traducción y Comunicación; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Praxiling UMR 5267 CNRS; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Information Retrieval Lab; -
scopus.contributor.subaffiliation HiTZ Basque Center for Language Techonology;Ixa; -
scopus.contributor.subaffiliation Institute of Computer Science; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Language Technology Research Group; -
scopus.contributor.subaffiliation Galician Language Institute; -
scopus.contributor.subaffiliation Johan Skytte Institute of Political Studies; -
scopus.contributor.subaffiliation Department of Nordic Studies and Linguistics; -
scopus.contributor.subaffiliation Institute of Information and Communication Technologies; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Helsinki Institute for Social Sciences and Humanities; -
scopus.contributor.subaffiliation UCREL NLP research group; -
scopus.contributor.subaffiliation Galician Language Institute; -
scopus.contributor.subaffiliation Austrian Centre for Digital Humanities and Cultural Heritage; -
scopus.contributor.subaffiliation Department of Statistics; -
scopus.contributor.subaffiliation -
scopus.contributor.surname Erjavec -
scopus.contributor.surname Kopp -
scopus.contributor.surname Ljubešić -
scopus.contributor.surname Kuzman -
scopus.contributor.surname Rayson -
scopus.contributor.surname Osenova -
scopus.contributor.surname Ogrodniczuk -
scopus.contributor.surname Çöltekin -
scopus.contributor.surname Koržinek -
scopus.contributor.surname Meden -
scopus.contributor.surname Skubic -
scopus.contributor.surname Rupnik -
scopus.contributor.surname Agnoloni -
scopus.contributor.surname Aires -
scopus.contributor.surname Barkarson -
scopus.contributor.surname Bartolini -
scopus.contributor.surname Bel -
scopus.contributor.surname Calzada Pérez -
scopus.contributor.surname Darģis -
scopus.contributor.surname Diwersy -
scopus.contributor.surname Gavriilidou -
scopus.contributor.surname van Heusden -
scopus.contributor.surname Iruskieta -
scopus.contributor.surname Kahusk -
scopus.contributor.surname Kryvenko -
scopus.contributor.surname Ligeti-Nagy -
scopus.contributor.surname Magariños -
scopus.contributor.surname Mölder -
scopus.contributor.surname Navarretta -
scopus.contributor.surname Simov -
scopus.contributor.surname Tungland -
scopus.contributor.surname Tuominen -
scopus.contributor.surname Vidler -
scopus.contributor.surname Vladu -
scopus.contributor.surname Wissik -
scopus.contributor.surname Yrjänäinen -
scopus.contributor.surname Fišer -
scopus.date.issued 2025 *
scopus.description.abstracteng The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed. *
scopus.description.allpeopleoriginal Erjavec T.; Kopp M.; Ljubesic N.; Kuzman T.; Rayson P.; Osenova P.; Ogrodniczuk M.; Coltekin C.; Korzinek D.; Meden K.; Skubic J.; Rupnik P.; Agnoloni T.; Aires J.; Barkarson S.; Bartolini R.; Bel N.; Calzada Perez M.; Dargis R.; Diwersy S.; Gavriilidou M.; van Heusden R.; Iruskieta M.; Kahusk N.; Kryvenko A.; Ligeti-Nagy N.; Magarinos C.; Molder M.; Navarretta C.; Simov K.; Tungland L.M.; Tuominen J.; Vidler J.; Vladu A.I.; Wissik T.; Yrjanainen V.; Fiser D. *
scopus.differences scopus.relation.lastpage *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.relation.firstpage *
scopus.differences scopus.description.allpeopleoriginal *
scopus.differences scopus.relation.issue *
scopus.differences scopus.date.issued *
scopus.differences scopus.identifier.doi *
scopus.differences scopus.relation.volume *
scopus.document.type ar *
scopus.document.types ar *
scopus.funding.funders 501100001734 - Københavns Universitet; 501100002341 - Research Council of Finland; 501100000780 - European Commission; 501100015068 - Universidade de Santiago de Compostela; 501100001822 - Österreichische Akademie der Wissenschaften; 501100010801 - Xunta de Galicia; 501100004382 - Polska Akademia Nauk; 501100003451 - Euskal Herriko Unibertsitatea; 501100008530 - European Regional Development Fund; 501100004329 - The Slovenian Research and Innovation Agency; 501100004329 - The Slovenian Research and Innovation Agency; 501100001823 - Ministerstvo Školství, Mládeže a Tělovýchovy; 501100001823 - Ministerstvo Školství, Mládeže a Tělovýchovy; 501100004837 - Ministerio de Ciencia e Innovación; 501100004837 - Ministerio de Ciencia e Innovación; 501100004569 - Ministerstwo Edukacji i Nauki; 501100004569 - Ministerstwo Edukacji i Nauki; 501100001871 - Fundação para a Ciência e a Tecnologia; 501100001871 - Fundação para a Ciência e a Tecnologia; 501100005992 - Ministry of Education and Science; 501100005992 - Ministry of Education and Science; *
scopus.funding.ids J7-4642; LM2023062; PID2019-108866RB-I00 / AEI / 10.13039/501100011033; 2022/WK/09; P6-0411; N6-0099; N6-0288; P2-0103; P6-0436; UIDP/00214/2020; 2022–2024; DO1-301/17.12.21; *
scopus.identifier.doi 10.1007/s10579-024-09798-w *
scopus.identifier.eissn 1574-0218 *
scopus.identifier.pui 2032781314 *
scopus.identifier.scopus 2-s2.0-85213520565 *
scopus.journal.sourceid 145663 *
scopus.language.iso eng *
scopus.publisher.name Springer Science and Business Media B.V. *
scopus.relation.firstpage 2071 *
scopus.relation.issue 3 *
scopus.relation.lastpage 2102 *
scopus.relation.volume 59 *
scopus.subject.keywords Comparable corpora; Parliamentary proceedings; TEI; *
scopus.title ParlaMint II: advancing comparable parliamentary corpora across Europe *
scopus.titleeng ParlaMint II: advancing comparable parliamentary corpora across Europe *
Appare nelle tipologie: 01.01 Articolo in rivista
File in questo prodotto:
File Dimensione Formato  
parla_mint_ii_advancing_comparable_parliamentary_corpora_across_europe-1.pdf

solo utenti autorizzati

Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.1 MB
Formato Adobe PDF
1.1 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/483041
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 11
social impact