CNR Institutional Research Information System

Voci della Grande Guerra ("Voices of the Great War") is the first large corpus of Italian historical texts dating back to the period of First World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it gives account of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducated writers), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historical perspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles of narrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, the language variety used, the author type and the typology of conveyed contents. The corpus is annotated with lemmas, part-of-speech, terminology, and named entities. Significant corpus samples representative of the different "voices" have also been enriched with meta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebank complying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to build it, and the Web Interface for navigating it.

Voices of the Great War: A Richly Annotated Corpus of Italian Texts on the First World War

Alessandro Lenci;Simonetta Montemagni;Federico Boschetti;Irene De Felice;Stefano dei Rossi;Felice Dell'Orletta;Michele Di Giorgio;Martina Miliani;Lucia C Passaro;Angelica Puddu;Giulia Venturi;Nicola Labanca

2020

Abstract

Voci della Grande Guerra ("Voices of the Great War") is the first large corpus of Italian historical texts dating back to the period of First World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it gives account of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducated writers), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historical perspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles of narrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, the language variety used, the author type and the typology of conveyed contents. The corpus is annotated with lemmas, part-of-speech, terminology, and named entities. Significant corpus samples representative of the different "voices" have also been enriched with meta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebank complying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to build it, and the Web Interface for navigating it.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Strutture organizzative
	
				Istituto di linguistica computazionale "Antonio Zampolli" - ILC
			
	Lingua/e
	
				Inglese
			
	Titolo del convegno
	
				Conference on Language Resources and Evaluation (LREC)
			
	Da pagina
	
				911
			
	A pagina
	
				918
			
	Numero di pagine
	
				7
			
	Codice ISBN
	
				979-10-95546-34-4
			
	URL
	
				https://www.aclweb.org/anthology/2020.lrec-1.114.pdf
			
	Nome Editore
	
				European Language Resources Association ELRA
			
	Città Editore
	
				Paris
			
	Nazione Editore
	
				FRANCIA
			
	Referee
	
				Sì, ma tipo non specificato
			
	Periodo del Convegno
	
				11-16/05/2020
			
	Parole chiave
	
				Historical Corpora
Linguistic and Meta-linguistic Annotation
Information Extraction
			
	Numero autori
	
				12
			
	Fulltext
	
				none
			
	Tutti gli autori
	
						Lenci, Alessandro; Montemagni, Simonetta; Boschetti, Federico; De Felice, Irene; dei Rossi, Stefano; Dell'Orletta, Felice; Di Giorgio, Michele; Milian...espandi
						
	Tipologia Login Miur
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				04 Contributo in convegno::04.01 Contributo in Atti di convegno
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/384922

Citazioni

ND

ND

ND

social impact