The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed.
ParlaMint II: Advancing Comparable Parliamentary Corpora Across Europe
Agnoloni, Tommaso;Bartolini, Roberto;
2024
Abstract
The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.ancejournal | LANGUAGE RESOURCES AND EVALUATION | en |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.orgunit | Istituto di Informatica Giuridica e Sistemi Giudiziari - IGSG | en |
| dc.authority.people | Erjavec, Tomaž | en |
| dc.authority.people | Kopp, Matyáš | en |
| dc.authority.people | Ljubešić, Nikola | en |
| dc.authority.people | Kuzman, Taja | en |
| dc.authority.people | Rayson, Paul | en |
| dc.authority.people | Osenova, Petya | en |
| dc.authority.people | Ogrodniczuk, Maciej | en |
| dc.authority.people | Çöltekin, Çağrı | en |
| dc.authority.people | Koržinek, Danijel | en |
| dc.authority.people | Meden, Katja | en |
| dc.authority.people | Skubic, Jure | en |
| dc.authority.people | Rupnik, Peter | en |
| dc.authority.people | Agnoloni, Tommaso | en |
| dc.authority.people | Aires, José | en |
| dc.authority.people | Barkarson, Starkaður | en |
| dc.authority.people | Bartolini, Roberto | en |
| dc.authority.people | Bel, Núria | en |
| dc.authority.people | Pérez, María Calzada | en |
| dc.authority.people | Darģis, Roberts | en |
| dc.authority.people | Diwersy, Sascha | en |
| dc.authority.people | Gavriilidou, Maria | en |
| dc.authority.people | Heusden, Ruben van | en |
| dc.authority.people | Iruskieta, Mikel | en |
| dc.authority.people | Kahusk, Neeme | en |
| dc.authority.people | Kryvenko, Anna | en |
| dc.authority.people | Ligeti-Nagy, Noémi | en |
| dc.authority.people | Magariños, Carmen | en |
| dc.authority.people | Mölder, Martin | en |
| dc.authority.people | Navarretta, Costanza | en |
| dc.authority.people | Simov, Kiril | en |
| dc.authority.people | Tungland, Lars Magne | en |
| dc.authority.people | Tuominen, Jouni | en |
| dc.authority.people | Vidler, John | en |
| dc.authority.people | Vladu, Adina Ioana | en |
| dc.authority.people | Wissik, Tanja | en |
| dc.authority.people | Yrjänäinen, Väinö | en |
| dc.authority.people | Fišer, Darja | en |
| dc.authority.project | CLARIN | en |
| dc.collection.id.s | b3f88f24-048a-4e43-8ab1-6697b90e068e | * |
| dc.collection.name | 01.01 Articolo in rivista | * |
| dc.contributor.appartenenza | Istituto di Informatica Giuridica e Sistemi Giudiziari - IGSG | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.appartenenza.mi | 1108 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2024/08/27 10:49:32 | - |
| dc.date.available | 2024/08/27 10:49:32 | - |
| dc.date.firstsubmission | 2024/07/05 18:36:55 | * |
| dc.date.issued | 2024 | - |
| dc.date.submission | 2025/03/07 15:04:33 | * |
| dc.description.abstracteng | The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed. | - |
| dc.description.allpeople | Erjavec, Tomaž; Kopp, Matyáš; Ljubešić, Nikola; Kuzman, Taja; Rayson, Paul; Osenova, Petya; Ogrodniczuk, Maciej; Çöltekin, Çağrı; Koržinek, Danijel; Meden, Katja; Skubic, Jure; Rupnik, Peter; Agnoloni, Tommaso; Aires, José; Barkarson, Starkaður; Bartolini, Roberto; Bel, Núria; Pérez, María Calzada; Darģis, Roberts; Diwersy, Sascha; Gavriilidou, Maria; Heusden, Ruben van; Iruskieta, Mikel; Kahusk, Neeme; Kryvenko, Anna; Ligeti-Nagy, Noémi; Magariños, Carmen; Mölder, Martin; Navarretta, Costanza; Simov, Kiril; Tungland, Lars Magne; Tuominen, Jouni; Vidler, John; Vladu, Adina Ioana; Wissik, Tanja; Yrjänäinen, Väinö; Fišer, Darja | - |
| dc.description.allpeopleoriginal | Erjavec, Tomaž; Kopp, Matyáš; Ljubešić, Nikola; Kuzman, Taja; Rayson, Paul; Osenova, Petya; Ogrodniczuk, Maciej; Çöltekin, Çağrı; Koržinek, Danijel; Meden, Katja; Skubic, Jure; Rupnik, Peter; Agnoloni, Tommaso; Aires, José; Barkarson, Starkaður; Bartolini, Roberto; Bel, Núria; Pérez, María Calzada; Darģis, Roberts; Diwersy, Sascha; Gavriilidou, Maria; Heusden, Ruben van; Iruskieta, Mikel; Kahusk, Neeme; Kryvenko, Anna; Ligeti-Nagy, Noémi; Magariños, Carmen; Mölder, Martin; Navarretta, Costanza; Simov, Kiril; Tungland, Lars Magne; Tuominen, Jouni; Vidler, John; Vladu, Adina Ioana; Wissik, Tanja; Yrjänäinen, Väinö; Fišer, Darja | en |
| dc.description.fulltext | restricted | en |
| dc.description.numberofauthors | 37 | - |
| dc.identifier.doi | 10.21203/rs.3.rs-4176128/v1 | en |
| dc.identifier.isi | WOS:001385018200001 | - |
| dc.identifier.scopus | 2-s2.0-85213520565 | en |
| dc.identifier.source | crossref | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/483041 | - |
| dc.language.iso | eng | en |
| dc.relation.medium | ELETTRONICO | en |
| dc.relation.projectAcronym | - | en |
| dc.relation.projectAwardNumber | - | en |
| dc.relation.projectAwardTitle | CLARIN | en |
| dc.relation.projectFunderName | - | en |
| dc.relation.projectFundingStream | - | en |
| dc.subject.keywordseng | Parliamentary proceedings | - |
| dc.subject.keywordseng | TEI | - |
| dc.subject.keywordseng | Comparable corpora | - |
| dc.subject.singlekeyword | Parliamentary proceedings | * |
| dc.subject.singlekeyword | TEI | * |
| dc.subject.singlekeyword | Comparable corpora | * |
| dc.title | ParlaMint II: Advancing Comparable Parliamentary Corpora Across Europe | en |
| dc.type.circulation | Internazionale | en |
| dc.type.driver | info:eu-repo/semantics/article | - |
| dc.type.full | 01 Contributo su Rivista::01.01 Articolo in rivista | it |
| dc.type.impactfactor | si | en |
| dc.type.miur | 262 | - |
| iris.isi.extIssued | 2025 | - |
| iris.isi.extTitle | ParlaMint II: advancing comparable parliamentary corpora across Europe | - |
| iris.mediafilter.data | 2025/03/23 03:18:27 | * |
| iris.orcid.lastModifiedDate | 2025/09/06 01:09:30 | * |
| iris.orcid.lastModifiedMillisecond | 1757113770069 | * |
| iris.scopus.extIssued | 2025 | - |
| iris.scopus.extTitle | ParlaMint II: advancing comparable parliamentary corpora across Europe | - |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.bestoaversion | acceptedVersion | * |
| iris.unpaywall.doi | 10.21203/rs.3.rs-4176128/v1 | * |
| iris.unpaywall.isoa | true | * |
| iris.unpaywall.landingpage | https://doi.org/10.21203/rs.3.rs-4176128/v1 | * |
| iris.unpaywall.license | cc-by | * |
| iris.unpaywall.metadataCallLastModified | 06/09/2025 04:24:32 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1757125472154 | - |
| iris.unpaywall.oastatus | gold | * |
| iris.unpaywall.pdfurl | https://www.researchsquare.com/article/rs-4176128/latest.pdf | * |
| isi.authority.ancejournal | LANGUAGE RESOURCES AND EVALUATION###1574-020X | * |
| isi.category | EV | * |
| isi.contributor.affiliation | Slovenian Academy of Sciences & Arts (SASA) | - |
| isi.contributor.affiliation | Charles University Prague | - |
| isi.contributor.affiliation | Slovenian Academy of Sciences & Arts (SASA) | - |
| isi.contributor.affiliation | Slovenian Academy of Sciences & Arts (SASA) | - |
| isi.contributor.affiliation | Lancaster University | - |
| isi.contributor.affiliation | Bulgarian Academy of Sciences | - |
| isi.contributor.affiliation | Polish Academy of Sciences | - |
| isi.contributor.affiliation | Eberhard Karls University of Tubingen | - |
| isi.contributor.affiliation | Polsko-Japonska Akademia Technik Komputerowych | - |
| isi.contributor.affiliation | Slovenian Academy of Sciences & Arts (SASA) | - |
| isi.contributor.affiliation | Inst Contemporary Hist | - |
| isi.contributor.affiliation | Slovenian Academy of Sciences & Arts (SASA) | - |
| isi.contributor.affiliation | Consiglio Nazionale delle Ricerche (CNR) | - |
| isi.contributor.affiliation | Universidade de Lisboa | - |
| isi.contributor.affiliation | Arni Magnusson Inst Iceland Studies | - |
| isi.contributor.affiliation | Consiglio Nazionale delle Ricerche (CNR) | - |
| isi.contributor.affiliation | Pompeu Fabra University | - |
| isi.contributor.affiliation | Universitat Jaume I | - |
| isi.contributor.affiliation | - | |
| isi.contributor.affiliation | Paul Valery Univ Montpellier 3 | - |
| isi.contributor.affiliation | Athena Res & Innovat Ctr Informat Commun & Knowled | - |
| isi.contributor.affiliation | University of Amsterdam | - |
| isi.contributor.affiliation | University of Basque Country | - |
| isi.contributor.affiliation | University of Tartu | - |
| isi.contributor.affiliation | Inst Contemporary Hist | - |
| isi.contributor.affiliation | HUN-REN | - |
| isi.contributor.affiliation | Universidade de Santiago de Compostela | - |
| isi.contributor.affiliation | University of Tartu | - |
| isi.contributor.affiliation | University of Copenhagen | - |
| isi.contributor.affiliation | Bulgarian Academy of Sciences | - |
| isi.contributor.affiliation | Natl Lib Norway | - |
| isi.contributor.affiliation | University of Helsinki | - |
| isi.contributor.affiliation | Lancaster University | - |
| isi.contributor.affiliation | Universidade de Santiago de Compostela | - |
| isi.contributor.affiliation | Austrian Academy of Sciences | - |
| isi.contributor.affiliation | Uppsala University | - |
| isi.contributor.affiliation | Inst Contemporary Hist | - |
| isi.contributor.country | Slovenia | - |
| isi.contributor.country | Czech Republic | - |
| isi.contributor.country | Slovenia | - |
| isi.contributor.country | Slovenia | - |
| isi.contributor.country | England | - |
| isi.contributor.country | Bulgaria | - |
| isi.contributor.country | Poland | - |
| isi.contributor.country | Germany | - |
| isi.contributor.country | Poland | - |
| isi.contributor.country | Slovenia | - |
| isi.contributor.country | Slovenia | - |
| isi.contributor.country | Slovenia | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Portugal | - |
| isi.contributor.country | Iceland | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Spain | - |
| isi.contributor.country | Spain | - |
| isi.contributor.country | - | |
| isi.contributor.country | France | - |
| isi.contributor.country | Greece | - |
| isi.contributor.country | Netherlands | - |
| isi.contributor.country | Spain | - |
| isi.contributor.country | Estonia | - |
| isi.contributor.country | Slovenia | - |
| isi.contributor.country | Hungary | - |
| isi.contributor.country | Spain | - |
| isi.contributor.country | Estonia | - |
| isi.contributor.country | Denmark | - |
| isi.contributor.country | Bulgaria | - |
| isi.contributor.country | Norway | - |
| isi.contributor.country | Finland | - |
| isi.contributor.country | England | - |
| isi.contributor.country | Spain | - |
| isi.contributor.country | Austria | - |
| isi.contributor.country | Sweden | - |
| isi.contributor.country | Slovenia | - |
| isi.contributor.name | Tomaz | - |
| isi.contributor.name | Matyas | - |
| isi.contributor.name | Nikola | - |
| isi.contributor.name | Taja | - |
| isi.contributor.name | Paul | - |
| isi.contributor.name | Petya | - |
| isi.contributor.name | Maciej | - |
| isi.contributor.name | Cagri | - |
| isi.contributor.name | Danijel | - |
| isi.contributor.name | Katja | - |
| isi.contributor.name | Jure | - |
| isi.contributor.name | Peter | - |
| isi.contributor.name | Tommaso | - |
| isi.contributor.name | Jose | - |
| isi.contributor.name | Starkaour | - |
| isi.contributor.name | Roberto | - |
| isi.contributor.name | Nuria | - |
| isi.contributor.name | Maria Calzada | - |
| isi.contributor.name | Roberts | - |
| isi.contributor.name | Sascha | - |
| isi.contributor.name | Maria | - |
| isi.contributor.name | Ruben | - |
| isi.contributor.name | Mikel | - |
| isi.contributor.name | Neeme | - |
| isi.contributor.name | Anna | - |
| isi.contributor.name | Noemi | - |
| isi.contributor.name | Carmen | - |
| isi.contributor.name | Martin | - |
| isi.contributor.name | Costanza | - |
| isi.contributor.name | Kiril | - |
| isi.contributor.name | Lars Magne | - |
| isi.contributor.name | Jouni | - |
| isi.contributor.name | John | - |
| isi.contributor.name | Adina Ioana | - |
| isi.contributor.name | Tanja | - |
| isi.contributor.name | Vaino | - |
| isi.contributor.name | Darja | - |
| isi.contributor.researcherId | LBG-9042-2024 | - |
| isi.contributor.researcherId | M-6466-2017 | - |
| isi.contributor.researcherId | DWU-6583-2022 | - |
| isi.contributor.researcherId | LJK-2393-2024 | - |
| isi.contributor.researcherId | HKW-7858-2023 | - |
| isi.contributor.researcherId | P-2523-2019 | - |
| isi.contributor.researcherId | KDI-1218-2024 | - |
| isi.contributor.researcherId | EPM-9793-2022 | - |
| isi.contributor.researcherId | K-5168-2014 | - |
| isi.contributor.researcherId | KPP-7991-2024 | - |
| isi.contributor.researcherId | LZY-6766-2025 | - |
| isi.contributor.researcherId | FVT-4618-2022 | - |
| isi.contributor.researcherId | LZI-8973-2025 | - |
| isi.contributor.researcherId | ENK-8183-2022 | - |
| isi.contributor.researcherId | ENF-4114-2022 | - |
| isi.contributor.researcherId | ELG-2280-2022 | - |
| isi.contributor.researcherId | K-4604-2014 | - |
| isi.contributor.researcherId | DMM-7348-2022 | - |
| isi.contributor.researcherId | CLY-6227-2022 | - |
| isi.contributor.researcherId | CNU-2174-2022 | - |
| isi.contributor.researcherId | EZW-4700-2022 | - |
| isi.contributor.researcherId | EAF-3708-2022 | - |
| isi.contributor.researcherId | IAM-1255-2023 | - |
| isi.contributor.researcherId | LZV-8338-2025 | - |
| isi.contributor.researcherId | LZT-5463-2025 | - |
| isi.contributor.researcherId | ISU-0196-2023 | - |
| isi.contributor.researcherId | LZM-8785-2025 | - |
| isi.contributor.researcherId | FLV-0056-2022 | - |
| isi.contributor.researcherId | FZG-7018-2022 | - |
| isi.contributor.researcherId | AAB-3393-2019 | - |
| isi.contributor.researcherId | GGC-1122-2022 | - |
| isi.contributor.researcherId | E-9330-2019 | - |
| isi.contributor.researcherId | GDN-6366-2022 | - |
| isi.contributor.researcherId | MVV-9073-2025 | - |
| isi.contributor.researcherId | GJH-3353-2022 | - |
| isi.contributor.researcherId | LZQ-4144-2025 | - |
| isi.contributor.researcherId | FYR-1473-2022 | - |
| isi.contributor.subaffiliation | Dept Knowledge Technol | - |
| isi.contributor.subaffiliation | Inst Formal & Appl Linguist | - |
| isi.contributor.subaffiliation | Dept Knowledge Technol | - |
| isi.contributor.subaffiliation | Dept Knowledge Technol | - |
| isi.contributor.subaffiliation | UCREL NLP Res Grp | - |
| isi.contributor.subaffiliation | Inst Informat & Commun Technol | - |
| isi.contributor.subaffiliation | Inst Comp Sci | - |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | Dept Multimedia | - |
| isi.contributor.subaffiliation | Dept Knowledge Technol | - |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | Dept Knowledge Technol | - |
| isi.contributor.subaffiliation | Inst Legal Informat & Judicial Syst | - |
| isi.contributor.subaffiliation | Ctr Linguist | - |
| isi.contributor.subaffiliation | Dept Iceland | - |
| isi.contributor.subaffiliation | Ist Linguist Computazionale | - |
| isi.contributor.subaffiliation | Dept Translat & Language Sci | - |
| isi.contributor.subaffiliation | Dept Traducc & Comunicac | - |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | Praxiling UMR CNRS 5267 | - |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | Informat Retrieval Lab | - |
| isi.contributor.subaffiliation | HiTZ Basque Ctr Language Techonol | - |
| isi.contributor.subaffiliation | Inst Comp Sci | - |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | Language Technol Res Grp | - |
| isi.contributor.subaffiliation | Galician Language Inst | - |
| isi.contributor.subaffiliation | Johan Skytte Inst Polit Studies | - |
| isi.contributor.subaffiliation | Dept Nord Studies & Linguist | - |
| isi.contributor.subaffiliation | Inst Informat & Commun Technol | - |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | Helsinki Inst Social Sci & Humanities | - |
| isi.contributor.subaffiliation | UCREL NLP Res Grp | - |
| isi.contributor.subaffiliation | Galician Language Inst | - |
| isi.contributor.subaffiliation | Austrian Ctr Digital Humanities & Cultural Heritag | - |
| isi.contributor.subaffiliation | Dept Stat | - |
| isi.contributor.subaffiliation | - | |
| isi.contributor.surname | Erjavec | - |
| isi.contributor.surname | Kopp | - |
| isi.contributor.surname | Ljubesic | - |
| isi.contributor.surname | Kuzman | - |
| isi.contributor.surname | Rayson | - |
| isi.contributor.surname | Osenova | - |
| isi.contributor.surname | Ogrodniczuk | - |
| isi.contributor.surname | Coeltekin | - |
| isi.contributor.surname | Korzinek | - |
| isi.contributor.surname | Meden | - |
| isi.contributor.surname | Skubic | - |
| isi.contributor.surname | Rupnik | - |
| isi.contributor.surname | Agnoloni | - |
| isi.contributor.surname | Aires | - |
| isi.contributor.surname | Barkarson | - |
| isi.contributor.surname | Bartolini | - |
| isi.contributor.surname | Bel | - |
| isi.contributor.surname | Perez | - |
| isi.contributor.surname | Dargis | - |
| isi.contributor.surname | Diwersy | - |
| isi.contributor.surname | Gavriilidou | - |
| isi.contributor.surname | van Heusden | - |
| isi.contributor.surname | Iruskieta | - |
| isi.contributor.surname | Kahusk | - |
| isi.contributor.surname | Kryvenko | - |
| isi.contributor.surname | Ligeti-Nagy | - |
| isi.contributor.surname | Magarinos | - |
| isi.contributor.surname | Moelder | - |
| isi.contributor.surname | Navarretta | - |
| isi.contributor.surname | Simov | - |
| isi.contributor.surname | Tungland | - |
| isi.contributor.surname | Tuominen | - |
| isi.contributor.surname | Vidler | - |
| isi.contributor.surname | Vladu | - |
| isi.contributor.surname | Wissik | - |
| isi.contributor.surname | Yrjanainen | - |
| isi.contributor.surname | Fiser | - |
| isi.date.issued | 2025 | * |
| isi.description.abstracteng | The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed. | * |
| isi.description.allpeopleoriginal | Erjavec, T; Kopp, M; Ljubesic, N; Kuzman, T; Rayson, P; Osenova, P; Ogrodniczuk, M; Cöltekin, C; Korzinek, D; Meden, K; Skubic, J; Rupnik, P; Agnoloni, T; Aires, J; Barkarson, S; Bartolini, R; Bel, N; Pérez, MC; Dargis, R; Diwersy, S; Gavriilidou, M; van Heusden, R; Iruskieta, M; Kahusk, N; Kryvenko, A; Ligeti-Nagy, N; Magariños, C; Mölder, M; Navarretta, C; Simov, K; Tungland, LM; Tuominen, J; Vidler, J; Vladu, AI; Wissik, T; Yrjänäinen, V; Fiser, D; | * |
| isi.document.sourcetype | WOS.SCI | * |
| isi.document.type | Article | * |
| isi.document.types | Article | * |
| isi.identifier.doi | 10.1007/s10579-024-09798-w | * |
| isi.identifier.eissn | 1574-0218 | * |
| isi.identifier.isi | WOS:001385018200001 | * |
| isi.journal.journaltitle | LANGUAGE RESOURCES AND EVALUATION | * |
| isi.journal.journaltitleabbrev | LANG RESOUR EVAL | * |
| isi.language.original | English | * |
| isi.publisher.place | VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS | * |
| isi.relation.firstpage | 2071 | * |
| isi.relation.issue | 3 | * |
| isi.relation.lastpage | 2102 | * |
| isi.relation.volume | 59 | * |
| isi.title | ParlaMint II: advancing comparable parliamentary corpora across Europe | * |
| scopus.authority.ancejournal | LANGUAGE RESOURCES AND EVALUATION###1574-020X | * |
| scopus.category | 1203 | * |
| scopus.category | 3304 | * |
| scopus.category | 3310 | * |
| scopus.category | 3309 | * |
| scopus.contributor.affiliation | Jožef Stefan Institute | - |
| scopus.contributor.affiliation | Charles University | - |
| scopus.contributor.affiliation | Institute of Contemporary History | - |
| scopus.contributor.affiliation | Jožef Stefan Institute | - |
| scopus.contributor.affiliation | Lancaster University | - |
| scopus.contributor.affiliation | Bulgarian Academy of Sciences | - |
| scopus.contributor.affiliation | Polish Academy of Sciences | - |
| scopus.contributor.affiliation | University of Tübingen | - |
| scopus.contributor.affiliation | Polish-Japanese Academy of Information Technology | - |
| scopus.contributor.affiliation | Institute of Contemporary History | - |
| scopus.contributor.affiliation | Institute of Contemporary History | - |
| scopus.contributor.affiliation | Jožef Stefan Institute | - |
| scopus.contributor.affiliation | CNR | - |
| scopus.contributor.affiliation | University of Lisbon | - |
| scopus.contributor.affiliation | The Árni Magnússon Institute for Icelandic Studies | - |
| scopus.contributor.affiliation | CNR | - |
| scopus.contributor.affiliation | Pompeu Fabra University | - |
| scopus.contributor.affiliation | Universitat Jaume I | - |
| scopus.contributor.affiliation | IMCS at the University of Latvia | - |
| scopus.contributor.affiliation | Paul Valéry University Montpellier 3 | - |
| scopus.contributor.affiliation | Athena Research & Innovation Center in Information Communication & Knowledge Technologies | - |
| scopus.contributor.affiliation | University of Amsterdam | - |
| scopus.contributor.affiliation | University of the Basque Country (UPV/EHU) | - |
| scopus.contributor.affiliation | University of Tartu | - |
| scopus.contributor.affiliation | NISS | - |
| scopus.contributor.affiliation | HUN-REN Hungarian Research Centre for Linguistics | - |
| scopus.contributor.affiliation | University of Santiago de Compostela | - |
| scopus.contributor.affiliation | University of Tartu | - |
| scopus.contributor.affiliation | University of Copenhagen | - |
| scopus.contributor.affiliation | Bulgarian Academy of Sciences | - |
| scopus.contributor.affiliation | National Library of Norway | - |
| scopus.contributor.affiliation | University of Helsinki | - |
| scopus.contributor.affiliation | Lancaster University | - |
| scopus.contributor.affiliation | University of Santiago de Compostela | - |
| scopus.contributor.affiliation | Austrian Academy of Sciences | - |
| scopus.contributor.affiliation | Uppsala University | - |
| scopus.contributor.affiliation | Institute of Contemporary History | - |
| scopus.contributor.afid | 60023955 | - |
| scopus.contributor.afid | 60016605 | - |
| scopus.contributor.afid | 129102120 | - |
| scopus.contributor.afid | 60023955 | - |
| scopus.contributor.afid | 60023643 | - |
| scopus.contributor.afid | 60109565 | - |
| scopus.contributor.afid | 60010993 | - |
| scopus.contributor.afid | 60017246 | - |
| scopus.contributor.afid | 60027964 | - |
| scopus.contributor.afid | 129102120 | - |
| scopus.contributor.afid | 129102120 | - |
| scopus.contributor.afid | 60023955 | - |
| scopus.contributor.afid | 60021199 | - |
| scopus.contributor.afid | 60041243 | - |
| scopus.contributor.afid | 60071113 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60032942 | - |
| scopus.contributor.afid | 60002676 | - |
| scopus.contributor.afid | 60071043 | - |
| scopus.contributor.afid | 60009278 | - |
| scopus.contributor.afid | 60104388 | - |
| scopus.contributor.afid | 60002483 | - |
| scopus.contributor.afid | 60027856 | - |
| scopus.contributor.afid | 60068856 | - |
| scopus.contributor.afid | 131271276 | - |
| scopus.contributor.afid | 60020907 | - |
| scopus.contributor.afid | 60028419 | - |
| scopus.contributor.afid | 60068856 | - |
| scopus.contributor.afid | 60030840 | - |
| scopus.contributor.afid | 60109565 | - |
| scopus.contributor.afid | 125128547 | - |
| scopus.contributor.afid | 60002952 | - |
| scopus.contributor.afid | 60023643 | - |
| scopus.contributor.afid | 60028419 | - |
| scopus.contributor.afid | 60003156 | - |
| scopus.contributor.afid | 60003858 | - |
| scopus.contributor.afid | 129102120 | - |
| scopus.contributor.auid | 56151465000 | - |
| scopus.contributor.auid | 57195428424 | - |
| scopus.contributor.auid | 56829162700 | - |
| scopus.contributor.auid | 57197735572 | - |
| scopus.contributor.auid | 8652019200 | - |
| scopus.contributor.auid | 8933829900 | - |
| scopus.contributor.auid | 54880531200 | - |
| scopus.contributor.auid | 56548968900 | - |
| scopus.contributor.auid | 15042645500 | - |
| scopus.contributor.auid | 57222076348 | - |
| scopus.contributor.auid | 58046200700 | - |
| scopus.contributor.auid | 59454607400 | - |
| scopus.contributor.auid | 57199421725 | - |
| scopus.contributor.auid | 59134232000 | - |
| scopus.contributor.auid | 57205404526 | - |
| scopus.contributor.auid | 22333654100 | - |
| scopus.contributor.auid | 55369471300 | - |
| scopus.contributor.auid | 6506330957 | - |
| scopus.contributor.auid | 56982845800 | - |
| scopus.contributor.auid | 57194974345 | - |
| scopus.contributor.auid | 57219589249 | - |
| scopus.contributor.auid | 57211109229 | - |
| scopus.contributor.auid | 27667722000 | - |
| scopus.contributor.auid | 6505791173 | - |
| scopus.contributor.auid | 57218937567 | - |
| scopus.contributor.auid | 57205401241 | - |
| scopus.contributor.auid | 57140569700 | - |
| scopus.contributor.auid | 55258287100 | - |
| scopus.contributor.auid | 14058464000 | - |
| scopus.contributor.auid | 8835805500 | - |
| scopus.contributor.auid | 59134232100 | - |
| scopus.contributor.auid | 24386059900 | - |
| scopus.contributor.auid | 57015236100 | - |
| scopus.contributor.auid | 57226647109 | - |
| scopus.contributor.auid | 55842078100 | - |
| scopus.contributor.auid | 57579634800 | - |
| scopus.contributor.auid | 25121446200 | - |
| scopus.contributor.country | Slovenia | - |
| scopus.contributor.country | Czech Republic | - |
| scopus.contributor.country | Slovenia | - |
| scopus.contributor.country | Slovenia | - |
| scopus.contributor.country | United Kingdom | - |
| scopus.contributor.country | Bulgaria | - |
| scopus.contributor.country | Poland | - |
| scopus.contributor.country | Germany | - |
| scopus.contributor.country | Poland | - |
| scopus.contributor.country | Slovenia | - |
| scopus.contributor.country | Slovenia | - |
| scopus.contributor.country | Slovenia | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Portugal | - |
| scopus.contributor.country | Iceland | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Spain | - |
| scopus.contributor.country | Spain | - |
| scopus.contributor.country | Latvia | - |
| scopus.contributor.country | France | - |
| scopus.contributor.country | Greece | - |
| scopus.contributor.country | Netherlands | - |
| scopus.contributor.country | Spain | - |
| scopus.contributor.country | Estonia | - |
| scopus.contributor.country | Ukraine | - |
| scopus.contributor.country | Hungary | - |
| scopus.contributor.country | Spain | - |
| scopus.contributor.country | Estonia | - |
| scopus.contributor.country | Denmark | - |
| scopus.contributor.country | Bulgaria | - |
| scopus.contributor.country | Norway | - |
| scopus.contributor.country | Finland | - |
| scopus.contributor.country | United Kingdom | - |
| scopus.contributor.country | Spain | - |
| scopus.contributor.country | Austria | - |
| scopus.contributor.country | Sweden | - |
| scopus.contributor.country | Slovenia | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 103854668 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 104417131 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 125212332 | - |
| scopus.contributor.dptid | 131805289 | - |
| scopus.contributor.dptid | 104786102 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 108255548 | - |
| scopus.contributor.dptid | 105729190 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 129214040 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 113098505 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 132232694 | - |
| scopus.contributor.dptid | 131272211 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 104565667 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 128336596 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 131272211 | - |
| scopus.contributor.dptid | 127562805 | - |
| scopus.contributor.dptid | 103243768 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.name | Tomaž | - |
| scopus.contributor.name | Matyáš | - |
| scopus.contributor.name | Nikola | - |
| scopus.contributor.name | Taja | - |
| scopus.contributor.name | Paul | - |
| scopus.contributor.name | Petya | - |
| scopus.contributor.name | Maciej | - |
| scopus.contributor.name | Çağrı | - |
| scopus.contributor.name | Danijel | - |
| scopus.contributor.name | Katja | - |
| scopus.contributor.name | Jure | - |
| scopus.contributor.name | Peter | - |
| scopus.contributor.name | Tommaso | - |
| scopus.contributor.name | José | - |
| scopus.contributor.name | Starkaður | - |
| scopus.contributor.name | Roberto | - |
| scopus.contributor.name | Núria | - |
| scopus.contributor.name | María | - |
| scopus.contributor.name | Roberts | - |
| scopus.contributor.name | Sascha | - |
| scopus.contributor.name | Maria | - |
| scopus.contributor.name | Ruben | - |
| scopus.contributor.name | Mikel | - |
| scopus.contributor.name | Neeme | - |
| scopus.contributor.name | Anna | - |
| scopus.contributor.name | Noémi | - |
| scopus.contributor.name | Carmen | - |
| scopus.contributor.name | Martin | - |
| scopus.contributor.name | Costanza | - |
| scopus.contributor.name | Kiril | - |
| scopus.contributor.name | Lars Magne | - |
| scopus.contributor.name | Jouni | - |
| scopus.contributor.name | John | - |
| scopus.contributor.name | Adina Ioana | - |
| scopus.contributor.name | Tanja | - |
| scopus.contributor.name | Väinö | - |
| scopus.contributor.name | Darja | - |
| scopus.contributor.subaffiliation | Department of Knowledge Technologies; | - |
| scopus.contributor.subaffiliation | Institute of Formal and Applied Linguistics;Faculty of Mathematics and Physics; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | Department of Knowledge Technologies; | - |
| scopus.contributor.subaffiliation | UCREL NLP research group; | - |
| scopus.contributor.subaffiliation | Institute of Information and Communication Technologies; | - |
| scopus.contributor.subaffiliation | Institute of Computer Science; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | Department of Multimedia; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | Department of Knowledge Technologies; | - |
| scopus.contributor.subaffiliation | Institute of Legal Informatics and Judicial Systems; | - |
| scopus.contributor.subaffiliation | School of Arts and Humanities - Centre of Linguistics; | - |
| scopus.contributor.subaffiliation | Department of Icelandic; | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale; | - |
| scopus.contributor.subaffiliation | Department of Translation and Language Sciences; | - |
| scopus.contributor.subaffiliation | Departamento de Traducción y Comunicación; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | Praxiling UMR 5267 CNRS; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | Information Retrieval Lab; | - |
| scopus.contributor.subaffiliation | HiTZ Basque Center for Language Techonology;Ixa; | - |
| scopus.contributor.subaffiliation | Institute of Computer Science; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | Language Technology Research Group; | - |
| scopus.contributor.subaffiliation | Galician Language Institute; | - |
| scopus.contributor.subaffiliation | Johan Skytte Institute of Political Studies; | - |
| scopus.contributor.subaffiliation | Department of Nordic Studies and Linguistics; | - |
| scopus.contributor.subaffiliation | Institute of Information and Communication Technologies; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | Helsinki Institute for Social Sciences and Humanities; | - |
| scopus.contributor.subaffiliation | UCREL NLP research group; | - |
| scopus.contributor.subaffiliation | Galician Language Institute; | - |
| scopus.contributor.subaffiliation | Austrian Centre for Digital Humanities and Cultural Heritage; | - |
| scopus.contributor.subaffiliation | Department of Statistics; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.surname | Erjavec | - |
| scopus.contributor.surname | Kopp | - |
| scopus.contributor.surname | Ljubešić | - |
| scopus.contributor.surname | Kuzman | - |
| scopus.contributor.surname | Rayson | - |
| scopus.contributor.surname | Osenova | - |
| scopus.contributor.surname | Ogrodniczuk | - |
| scopus.contributor.surname | Çöltekin | - |
| scopus.contributor.surname | Koržinek | - |
| scopus.contributor.surname | Meden | - |
| scopus.contributor.surname | Skubic | - |
| scopus.contributor.surname | Rupnik | - |
| scopus.contributor.surname | Agnoloni | - |
| scopus.contributor.surname | Aires | - |
| scopus.contributor.surname | Barkarson | - |
| scopus.contributor.surname | Bartolini | - |
| scopus.contributor.surname | Bel | - |
| scopus.contributor.surname | Calzada Pérez | - |
| scopus.contributor.surname | Darģis | - |
| scopus.contributor.surname | Diwersy | - |
| scopus.contributor.surname | Gavriilidou | - |
| scopus.contributor.surname | van Heusden | - |
| scopus.contributor.surname | Iruskieta | - |
| scopus.contributor.surname | Kahusk | - |
| scopus.contributor.surname | Kryvenko | - |
| scopus.contributor.surname | Ligeti-Nagy | - |
| scopus.contributor.surname | Magariños | - |
| scopus.contributor.surname | Mölder | - |
| scopus.contributor.surname | Navarretta | - |
| scopus.contributor.surname | Simov | - |
| scopus.contributor.surname | Tungland | - |
| scopus.contributor.surname | Tuominen | - |
| scopus.contributor.surname | Vidler | - |
| scopus.contributor.surname | Vladu | - |
| scopus.contributor.surname | Wissik | - |
| scopus.contributor.surname | Yrjänäinen | - |
| scopus.contributor.surname | Fišer | - |
| scopus.date.issued | 2025 | * |
| scopus.description.abstracteng | The paper presents the results of the ParlaMint II project, which comprise comparable corpora of parliamentary debates of 29 European countries and autonomous regions, covering at least the period from 2015 to 2022, and containing over 1 billion words. The corpora are uniformly encoded, contain rich metadata about their 24 thousand speakers, and are linguistically annotated up to the level of Universal Dependencies syntax and named entities. The paper focuses on the enhancement made since the ParlaMint I project and presents the compilation of the corpora, including the encoding infrastructure, use of GitHub, the production of individual corpora, the common pipeline for producing their distribution, and use of CLARIN services for dissemination. It then gives a quantitative overview of the produced corpora, followed by the qualitative additions made within the ParlaMint II project, namely metadata localisation, the addition of new metadata, such as the political orientation of political parties, the machine translation of the corpora to English and its tagging with semantic classes, and the production of pilot speech corpora. Finally, outreach activities and further work are discussed. | * |
| scopus.description.allpeopleoriginal | Erjavec T.; Kopp M.; Ljubesic N.; Kuzman T.; Rayson P.; Osenova P.; Ogrodniczuk M.; Coltekin C.; Korzinek D.; Meden K.; Skubic J.; Rupnik P.; Agnoloni T.; Aires J.; Barkarson S.; Bartolini R.; Bel N.; Calzada Perez M.; Dargis R.; Diwersy S.; Gavriilidou M.; van Heusden R.; Iruskieta M.; Kahusk N.; Kryvenko A.; Ligeti-Nagy N.; Magarinos C.; Molder M.; Navarretta C.; Simov K.; Tungland L.M.; Tuominen J.; Vidler J.; Vladu A.I.; Wissik T.; Yrjanainen V.; Fiser D. | * |
| scopus.differences | scopus.relation.lastpage | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.relation.firstpage | * |
| scopus.differences | scopus.description.allpeopleoriginal | * |
| scopus.differences | scopus.relation.issue | * |
| scopus.differences | scopus.date.issued | * |
| scopus.differences | scopus.identifier.doi | * |
| scopus.differences | scopus.relation.volume | * |
| scopus.document.type | ar | * |
| scopus.document.types | ar | * |
| scopus.funding.funders | 501100001734 - Københavns Universitet; 501100002341 - Research Council of Finland; 501100000780 - European Commission; 501100015068 - Universidade de Santiago de Compostela; 501100001822 - Österreichische Akademie der Wissenschaften; 501100010801 - Xunta de Galicia; 501100004382 - Polska Akademia Nauk; 501100003451 - Euskal Herriko Unibertsitatea; 501100008530 - European Regional Development Fund; 501100004329 - The Slovenian Research and Innovation Agency; 501100004329 - The Slovenian Research and Innovation Agency; 501100001823 - Ministerstvo Školství, Mládeže a Tělovýchovy; 501100001823 - Ministerstvo Školství, Mládeže a Tělovýchovy; 501100004837 - Ministerio de Ciencia e Innovación; 501100004837 - Ministerio de Ciencia e Innovación; 501100004569 - Ministerstwo Edukacji i Nauki; 501100004569 - Ministerstwo Edukacji i Nauki; 501100001871 - Fundação para a Ciência e a Tecnologia; 501100001871 - Fundação para a Ciência e a Tecnologia; 501100005992 - Ministry of Education and Science; 501100005992 - Ministry of Education and Science; | * |
| scopus.funding.ids | J7-4642; LM2023062; PID2019-108866RB-I00 / AEI / 10.13039/501100011033; 2022/WK/09; P6-0411; N6-0099; N6-0288; P2-0103; P6-0436; UIDP/00214/2020; 2022–2024; DO1-301/17.12.21; | * |
| scopus.identifier.doi | 10.1007/s10579-024-09798-w | * |
| scopus.identifier.eissn | 1574-0218 | * |
| scopus.identifier.pui | 2032781314 | * |
| scopus.identifier.scopus | 2-s2.0-85213520565 | * |
| scopus.journal.sourceid | 145663 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | Springer Science and Business Media B.V. | * |
| scopus.relation.firstpage | 2071 | * |
| scopus.relation.issue | 3 | * |
| scopus.relation.lastpage | 2102 | * |
| scopus.relation.volume | 59 | * |
| scopus.subject.keywords | Comparable corpora; Parliamentary proceedings; TEI; | * |
| scopus.title | ParlaMint II: advancing comparable parliamentary corpora across Europe | * |
| scopus.titleeng | ParlaMint II: advancing comparable parliamentary corpora across Europe | * |
| Appare nelle tipologie: | 01.01 Articolo in rivista | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
parla_mint_ii_advancing_comparable_parliamentary_corpora_across_europe-1.pdf
solo utenti autorizzati
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.1 MB
Formato
Adobe PDF
|
1.1 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


