Voci della Grande Guerra ("Voices of the Great War") is the first large corpus of Italian historical texts dating back to the period of First World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it gives account of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducated writers), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historical perspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles of narrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, the language variety used, the author type and the typology of conveyed contents. The corpus is annotated with lemmas, part-of-speech, terminology, and named entities. Significant corpus samples representative of the different "voices" have also been enriched with meta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebank complying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to build it, and the Web Interface for navigating it.

Voices of the Great War: A Richly Annotated Corpus of Italian Texts on the First World War

Simonetta Montemagni;Federico Boschetti;Felice Dell'Orletta;Giulia Venturi;
2020

Abstract

Voci della Grande Guerra ("Voices of the Great War") is the first large corpus of Italian historical texts dating back to the period of First World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it gives account of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducated writers), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historical perspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles of narrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, the language variety used, the author type and the typology of conveyed contents. The corpus is annotated with lemmas, part-of-speech, terminology, and named entities. Significant corpus samples representative of the different "voices" have also been enriched with meta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebank complying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to build it, and the Web Interface for navigating it.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Alessandro Lenci it
dc.authority.people Simonetta Montemagni it
dc.authority.people Federico Boschetti it
dc.authority.people Irene De Felice it
dc.authority.people Stefano dei Rossi it
dc.authority.people Felice Dell'Orletta it
dc.authority.people Michele Di Giorgio it
dc.authority.people Martina Miliani it
dc.authority.people Lucia C Passaro it
dc.authority.people Angelica Puddu it
dc.authority.people Giulia Venturi it
dc.authority.people Nicola Labanca it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/18 22:42:39 -
dc.date.available 2024/02/18 22:42:39 -
dc.date.issued 2020 -
dc.description.abstracteng Voci della Grande Guerra ("Voices of the Great War") is the first large corpus of Italian historical texts dating back to the period of First World War. This corpus differs from other existing resources in several respects. First, from the linguistic point of view it gives account of the wide range of varieties in which Italian was articulated in that period, namely from a diastratic (educated vs. uneducated writers), diaphasic (low/informal vs. high/formal registers) and diatopic (regional varieties, dialects) points of view. From the historical perspective, through a collection of texts belonging to different genres it represents different views on the war and the various styles of narrating war events and experiences. The final corpus is balanced along various dimensions, corresponding to the textual genre, the language variety used, the author type and the typology of conveyed contents. The corpus is annotated with lemmas, part-of-speech, terminology, and named entities. Significant corpus samples representative of the different "voices" have also been enriched with meta-linguistic and syntactic information. The layer of syntactic annotation forms the first nucleus of an Italian historical treebank complying with the Universal Dependencies standard. The paper illustrates the final resource, the methodology and tools used to build it, and the Web Interface for navigating it. -
dc.description.affiliations Università di Pisa; Istituto di Linguistica Computazionale "A. Zampolli" (ILC-CNR); WebSoup s.n.c; Università di Siena -
dc.description.allpeople Lenci, Alessandro; Montemagni, Simonetta; Boschetti, Federico; De Felice, Irene; dei Rossi, Stefano; Dell'Orletta, Felice; Di Giorgio, Michele; Miliani, Martina; C Passaro, Lucia; Puddu, Angelica; Venturi, Giulia; Labanca, Nicola -
dc.description.allpeopleoriginal Alessandro Lenci, Simonetta Montemagni, Federico Boschetti, Irene De Felice, Stefano dei Rossi, Felice Dell'Orletta, Michele Di Giorgio, Martina Miliani, Lucia C. Passaro, Angelica Puddu, Giulia Venturi, Nicola Labanca -
dc.description.fulltext none en
dc.description.numberofauthors 12 -
dc.identifier.isbn 979-10-95546-34-4 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/384922 -
dc.identifier.url https://www.aclweb.org/anthology/2020.lrec-1.114.pdf -
dc.language.iso eng -
dc.miur.last.status.update 2024-12-18T15:58:44Z *
dc.publisher.country FRA -
dc.publisher.name European Language Resources Association ELRA -
dc.publisher.place Paris -
dc.relation.conferencedate 11-16/05/2020 -
dc.relation.conferencename Conference on Language Resources and Evaluation (LREC) -
dc.relation.firstpage 911 -
dc.relation.lastpage 918 -
dc.relation.numberofpages 7 -
dc.subject.keywords Historical Corpora -
dc.subject.keywords Linguistic and Meta-linguistic Annotation -
dc.subject.keywords Information Extraction -
dc.subject.singlekeyword Historical Corpora *
dc.subject.singlekeyword Linguistic and Meta-linguistic Annotation *
dc.subject.singlekeyword Information Extraction *
dc.title Voices of the Great War: A Richly Annotated Corpus of Italian Texts on the First World War en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 435958 -
iris.orcid.lastModifiedDate 2024/04/04 10:09:29 *
iris.orcid.lastModifiedMillisecond 1712218169692 *
iris.scopus.extIssued 2020 -
iris.scopus.extTitle Voices of the great war: A richly annotated corpus of Italian texts on the first world war -
iris.sitodocente.maxattempts 10 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/384922
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact