CNR Institutional Research Information System

ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after November 1st 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the ParlaMint TEI-encoded corpora with the derived plain text version of the corpus along with TSV metadata on the speeches. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project. Note that there also exists the linguistically marked-up version of the corpus, which is available at http://hdl.handle.net/11356/1431.

Multilingual comparable corpora of parliamentary debates ParlaMint 2.1

Toma Erjavec;Maciej Ogrodniczuk;Petya Osenova;Nikola Ljubei;Kiril Simov;Vladislava Grigorova;Micha Rudolf;Andrej Panur;Matyá Kopp;Starkaður Barkarson;Steinþor Steingrímsson;Henk van der Pol;Griet Depoorter;Jesse de Does;Bart Jongejan;Dorte Haltrup Hansen;Costanza Navarretta;María Calzada Pérez;Luciana D de Macedo;Ruben van Heusden;Maarten Marx;Çar Çöltekin;Matthew Coole;Tommaso Agnoloni;Francesca Frontini;Simonetta Montemagni;Valeria Quochi;Giulia Venturi;Manuela Ruisi;Carlo Marchetti;Roberto Battistoni;Miklós Sebk;Orsolya Ring;Roberts Daris;Andrius Utka;Mindaugas Petkeviius;Monika Briediené;Tomas Krilaviius;Vaidas Morkeviius;Roberto Bartolini;Andrea Cimino;Sascha Diwersy;Giancarlo Luxardo;Paul Rayson

2021

Abstract

ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after November 1st 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the ParlaMint TEI-encoded corpora with the derived plain text version of the corpus along with TSV metadata on the speeches. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project. Note that there also exists the linguistically marked-up version of the corpus, which is available at http://hdl.handle.net/11356/1431.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Toma Erjavec	en
dc.authority.people	Maciej Ogrodniczuk	en
dc.authority.people	Petya Osenova	en
dc.authority.people	Nikola Ljubei	en
dc.authority.people	Kiril Simov	en
dc.authority.people	Vladislava Grigorova	en
dc.authority.people	Micha Rudolf	en
dc.authority.people	Andrej Panur	en
dc.authority.people	Matyá Kopp	en
dc.authority.people	Starkaður Barkarson	en
dc.authority.people	Steinþor Steingrímsson	en
dc.authority.people	Henk van der Pol	en
dc.authority.people	Griet Depoorter	en
dc.authority.people	Jesse de Does	en
dc.authority.people	Bart Jongejan	en
dc.authority.people	Dorte Haltrup Hansen	en
dc.authority.people	Costanza Navarretta	en
dc.authority.people	María Calzada Pérez	en
dc.authority.people	Luciana D de Macedo	en
dc.authority.people	Ruben van Heusden	en
dc.authority.people	Maarten Marx	en
dc.authority.people	Çar Çöltekin	en
dc.authority.people	Matthew Coole	en
dc.authority.people	Tommaso Agnoloni	en
dc.authority.people	Francesca Frontini	en
dc.authority.people	Simonetta Montemagni	en
dc.authority.people	Valeria Quochi	en
dc.authority.people	Giulia Venturi	en
dc.authority.people	Manuela Ruisi	en
dc.authority.people	Carlo Marchetti	en
dc.authority.people	Roberto Battistoni	en
dc.authority.people	Miklós Sebk	en
dc.authority.people	Orsolya Ring	en
dc.authority.people	Roberts Daris	en
dc.authority.people	Andrius Utka	en
dc.authority.people	Mindaugas Petkeviius	en
dc.authority.people	Monika Briediené	en
dc.authority.people	Tomas Krilaviius	en
dc.authority.people	Vaidas Morkeviius	en
dc.authority.people	Roberto Bartolini	en
dc.authority.people	Andrea Cimino	en
dc.authority.people	Sascha Diwersy	en
dc.authority.people	Giancarlo Luxardo	en
dc.authority.people	Paul Rayson	en
dc.authority.project	ParlaMint	en
dc.collection.id.s	aa7ef5cb-003d-421c-b2c8-870fc44d02e5	*
dc.collection.name	05.10 Dataset	*
dc.contributor.appartenenza	Istituto di Informatica Giuridica e Sistemi Giudiziari - IGSG	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.contributor.appartenenza.mi	1108	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.date.accessioned	2024/02/19 12:00:45	-
dc.date.available	2024/02/19 12:00:45	-
dc.date.firstsubmission	2025/03/05 10:25:03	*
dc.date.issued	2021	-
dc.date.submission	2025/03/06 11:46:05	*
dc.description.abstracteng	ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after November 1st 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the ParlaMint TEI-encoded corpora with the derived plain text version of the corpus along with TSV metadata on the speeches. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project. Note that there also exists the linguistically marked-up version of the corpus, which is available at http://hdl.handle.net/11356/1431.	-
dc.description.affiliations	n.d.	-
dc.description.allpeople	Erjavec, Toma; Ogrodniczuk, Maciej; Osenova, Petya; Ljubei, Nikola; Simov, Kiril; Grigorova, Vladislava; Rudolf, Micha; Panur, Andrej; Kopp, Matyá; Barkarson, Starkaður; Steingrímsson, Steinþor; van der Pol, Henk; Depoorter, Griet; de Does, Jesse; Jongejan, Bart; Haltrup Hansen, Dorte; Navarretta, Costanza; Calzada Pérez, María; D de Macedo, Luciana; van Heusden, Ruben; Marx, Maarten; Çöltekin, Çar; Coole, Matthew; Agnoloni, Tommaso; Frontini, Francesca; Montemagni, Simonetta; Quochi, Valeria; Venturi, Giulia; Ruisi, Manuela; Marchetti, Carlo; Battistoni, Roberto; Sebk, Miklós; Ring, Orsolya; Daris, Roberts; Utka, Andrius; Petkeviius, Mindaugas; Briediené, Monika; Krilaviius, Tomas; Morkeviius, Vaidas; Bartolini, Roberto; Cimino, Andrea; Diwersy, Sascha; Luxardo, Giancarlo; Rayson, Paul	-
dc.description.allpeopleoriginal	Toma? Erjavec, Maciej Ogrodniczuk, Petya Osenova, Nikola Ljube?i?, Kiril Simov, Vladislava Grigorova, Micha? Rudolf, Andrej Pan?ur, Matyá? Kopp, Starkaður Barkarson, Steinþor Steingrímsson, Henk van der Pol, Griet Depoorter, Jesse de Does, Bart Jongejan, Dorte Haltrup Hansen, Costanza Navarretta, María Calzada Pérez, Luciana D. de Macedo, Ruben van Heusden, Maarten Marx, Ça?r? Çöltekin, Matthew Coole, Tommaso Agnoloni, Francesca Frontini, Simonetta Montemagni, Valeria Quochi, Giulia Venturi, Manuela Ruisi, Carlo Marchetti, Roberto Battistoni, Miklós Seb?k, Orsolya Ring, Roberts Dar?is, Andrius Utka, Mindaugas Petkevi?ius, Monika Briediené, Tomas Krilavi?ius, Vaidas Morkevi?ius, Roberto Bartolini, Andrea Cimino, Sascha Diwersy, Giancarlo Luxardo, Paul Rayson	en
dc.description.fulltext	open	en
dc.description.international	si	en
dc.description.note	Il dataset risponde pienamente ai principi dei dati FAIR.	en
dc.description.numberofauthors	44	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/446080	-
dc.identifier.url	http://hdl.handle.net/11356/1432	en
dc.language.iso	ita	en
dc.language.iso	bul	en
dc.language.iso	cze	en
dc.language.iso	dan	en
dc.language.iso	fre	en
dc.language.iso	ice	en
dc.language.iso	lav	en
dc.language.iso	lit	en
dc.language.iso	dut	en
dc.language.iso	pol	en
dc.language.iso	slv	en
dc.language.iso	spa	en
dc.language.iso	tur	en
dc.language.iso	hun	en
dc.relation.medium	ELETTRONICO	en
dc.relation.projectAcronym	ParlaMint	en
dc.relation.projectAwardNumber	-	en
dc.relation.projectAwardTitle	ParlaMint: Comparable and Interoperable Parliamentary Corpora	en
dc.relation.projectFunderName	CLARIN-ERIC	en
dc.relation.projectFundingStream	-	en
dc.subject.keywordsita	ParlaMint	-
dc.subject.keywordsita	ParlaCLARIN	-
dc.subject.keywordsita	dibattiti parlamentari	-
dc.subject.keywordsita	covid-19	-
dc.subject.keywordsita	discorso politico	-
dc.subject.keywordsita	CLARIN	-
dc.subject.singlekeyword	ParlaMint	*
dc.subject.singlekeyword	ParlaCLARIN	*
dc.subject.singlekeyword	dibattiti parlamentari	*
dc.subject.singlekeyword	covid-19	*
dc.subject.singlekeyword	discorso politico	*
dc.subject.singlekeyword	CLARIN	*
dc.title	Multilingual comparable corpora of parliamentary debates ParlaMint 2.1	en
dc.type.driver	info:eu-repo/semantics/other	-
dc.type.full	05 Altro::05.10 Dataset	it
dc.type.miur	295	-
dc.ugov.descaux1	463865	-
iris.mediafilter.data	2025/04/03 03:50:06	*
iris.orcid.lastModifiedDate	2025/03/06 11:47:34	*
iris.orcid.lastModifiedMillisecond	1741258054291	*
iris.sitodocente.maxattempts	1	-
Appare nelle tipologie:	05.10 Dataset

File in questo prodotto:

File	Dimensione	Formato
ParlaMint_MultilingualPlaintxt.pdf accesso aperto Descrizione: Matadata descriptors of the dataset deposited in the CLARIN.SI reposotory Tipologia: Altro materiale allegato Licenza: Creative commons Dimensione 745.57 kB Formato Adobe PDF Visualizza/Apri	745.57 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/446080

Citazioni

ND

ND

ND

social impact