In this paper we describe some experiments related to a corpus derived from an authoritative historical Italian dictionary, namely the Grande dizionario della lingua italiana (‘Great Dictionary of Italian Language’, in short GDLI). Thanks to the digitization and structuring of this dictionary, we have been able to set up the first nucleus of a diachronic annotated corpus that selects—according to specific criteria, and distinguishing between prose and poetry—some of the quotations that within the entries illustrate the different definitions and sub-definitions. In fact, the GDLI presents a huge collection of quotations covering the entire history of the Italian language and thus ranging from the Middle Ages to the present day. The corpus was enriched with linguistic annotation and used to train and evaluate NLP models for POS tagging and lemmatization, with promising results.
Towards the Creation of a Diachronic Corpus for Italian: a Case Study on the GDLI Quotations
Manuel Favaro
;Elisa Guadagnini;Eva Sassolini;Simonetta Montemagni
2022
Abstract
In this paper we describe some experiments related to a corpus derived from an authoritative historical Italian dictionary, namely the Grande dizionario della lingua italiana (‘Great Dictionary of Italian Language’, in short GDLI). Thanks to the digitization and structuring of this dictionary, we have been able to set up the first nucleus of a diachronic annotated corpus that selects—according to specific criteria, and distinguishing between prose and poetry—some of the quotations that within the entries illustrate the different definitions and sub-definitions. In fact, the GDLI presents a huge collection of quotations covering the entire history of the Italian language and thus ranging from the Middle Ages to the present day. The corpus was enriched with linguistic annotation and used to train and evaluate NLP models for POS tagging and lemmatization, with promising results.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Manuel Favaro | en |
| dc.authority.people | Elisa Guadagnini | en |
| dc.authority.people | Eva Sassolini | en |
| dc.authority.people | Marco Biffi | en |
| dc.authority.people | Simonetta Montemagni | en |
| dc.authority.project | DUS.AD017.115 / CNR4C - Regione Toscana | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2025/02/25 17:54:40 | - |
| dc.date.available | 2025/02/25 17:54:40 | - |
| dc.date.firstsubmission | 2025/02/05 23:26:53 | * |
| dc.date.issued | 2022 | - |
| dc.date.submission | 2025/02/25 17:54:05 | * |
| dc.description.abstracteng | In this paper we describe some experiments related to a corpus derived from an authoritative historical Italian dictionary, namely the Grande dizionario della lingua italiana (‘Great Dictionary of Italian Language’, in short GDLI). Thanks to the digitization and structuring of this dictionary, we have been able to set up the first nucleus of a diachronic annotated corpus that selects—according to specific criteria, and distinguishing between prose and poetry—some of the quotations that within the entries illustrate the different definitions and sub-definitions. In fact, the GDLI presents a huge collection of quotations covering the entire history of the Italian language and thus ranging from the Middle Ages to the present day. The corpus was enriched with linguistic annotation and used to train and evaluate NLP models for POS tagging and lemmatization, with promising results. | - |
| dc.description.allpeople | Favaro, Manuel; Guadagnini, Elisa; Sassolini, Eva; Biffi, Marco; Montemagni, Simonetta | - |
| dc.description.allpeopleoriginal | Manuel Favaro, Elisa Guadagnini, Eva Sassolini, Marco Biffi, Simonetta Montemagni | en |
| dc.description.fulltext | open | en |
| dc.description.international | no | en |
| dc.description.numberofauthors | 5 | - |
| dc.identifier.isbn | 979-10-95546-78-8 | en |
| dc.identifier.source | manual | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/533922 | - |
| dc.identifier.url | http://www.lrec-conf.org/proceedings/lrec2022/workshops/LT4HALA/pdf/2022.lt4hala2022-1.13.pdf | en |
| dc.language.iso | eng | en |
| dc.publisher.country | FRA | en |
| dc.publisher.name | European Language Resources Association (ELRA) | en |
| dc.publisher.place | Paris | en |
| dc.relation.alleditors | Rachele Sprugnoli, Marco Passarotti | en |
| dc.relation.conferencedate | 20-25/06/2022 | en |
| dc.relation.conferencename | 2nd Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA 2022) | en |
| dc.relation.conferenceplace | Marseille | en |
| dc.relation.firstpage | 94 | en |
| dc.relation.ispartofbook | Proceedings of the 2nd Workshop on Language Technologies for Historical and Ancient Languages | en |
| dc.relation.lastpage | 100 | en |
| dc.relation.numberofpages | 7 | en |
| dc.relation.projectAcronym | - | en |
| dc.relation.projectAwardNumber | - | en |
| dc.relation.projectAwardTitle | DUS.AD017.115 / CNR4C - Regione Toscana | en |
| dc.relation.projectFunderName | - | en |
| dc.relation.projectFundingStream | - | en |
| dc.subject.keywordseng | Diachronic Corpus, Adaptation of Annotation Tools, Historical Dictionaries | - |
| dc.subject.singlekeyword | Diachronic Corpus | * |
| dc.subject.singlekeyword | Adaptation of Annotation Tools | * |
| dc.subject.singlekeyword | Historical Dictionaries | * |
| dc.title | Towards the Creation of a Diachronic Corpus for Italian: a Case Study on the GDLI Quotations | en |
| dc.type.circulation | Internazionale | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.impactfactor | no | en |
| dc.type.invited | contributo | en |
| dc.type.miur | 273 | - |
| dc.type.referee | Esperti anonimi | en |
| iris.mediafilter.data | 2025/04/03 04:08:53 | * |
| iris.orcid.lastModifiedDate | 2025/02/25 17:54:40 | * |
| iris.orcid.lastModifiedMillisecond | 1740502480086 | * |
| iris.sitodocente.maxattempts | 1 | - |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
2022.lt4hala-1.13.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
373.04 kB
Formato
Adobe PDF
|
373.04 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


