The paper describes ongoing work on the digitization of an authoritative historical Italian dictionary, namely Il Grande Dizionario della Lingua Italiana (GDLI), with a specific view to creating the prerequisites for advanced human-oriented querying. After discussing the general approach taken to extract and structure the GDLI contents, in the paper we report the encouraging results of a case study carried out against two volumes which have been selected for the different conversion issues raised. Dictionary content extraction and structuring is being carried out through an iterative process based on hand coded patterns: starting from the recognition of the entry headword, a series of truth conditions are tested which allow the building and progressive structuring, in successive steps, of the whole lexical entry. We also started to design the representation of extracted and structured entries in a standard format, encoded in TEI. An outline of an example entry is also provided and illustrated in order to show what the end result will look like.

Converting and structuring a digital historical dictionary of Italian: a case study

Eva Sassolini;Monica Monachini;Simonetta Montemagni
2019

Abstract

The paper describes ongoing work on the digitization of an authoritative historical Italian dictionary, namely Il Grande Dizionario della Lingua Italiana (GDLI), with a specific view to creating the prerequisites for advanced human-oriented querying. After discussing the general approach taken to extract and structure the GDLI contents, in the paper we report the encouraging results of a case study carried out against two volumes which have been selected for the different conversion issues raised. Dictionary content extraction and structuring is being carried out through an iterative process based on hand coded patterns: starting from the recognition of the entry headword, a series of truth conditions are tested which allow the building and progressive structuring, in successive steps, of the whole lexical entry. We also started to design the representation of extracted and structured entries in a standard format, encoded in TEI. An outline of an example entry is also provided and illustrated in order to show what the end result will look like.
Campo DC Valore Lingua
dc.authority.people Eva Sassolini it
dc.authority.people Anas Fahad Khan it
dc.authority.people Marco Biffi it
dc.authority.people Monica Monachini it
dc.authority.people Simonetta Montemagni it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/19 20:57:45 -
dc.date.available 2024/02/19 20:57:45 -
dc.date.issued 2019 -
dc.description.abstracteng The paper describes ongoing work on the digitization of an authoritative historical Italian dictionary, namely Il Grande Dizionario della Lingua Italiana (GDLI), with a specific view to creating the prerequisites for advanced human-oriented querying. After discussing the general approach taken to extract and structure the GDLI contents, in the paper we report the encouraging results of a case study carried out against two volumes which have been selected for the different conversion issues raised. Dictionary content extraction and structuring is being carried out through an iterative process based on hand coded patterns: starting from the recognition of the entry headword, a series of truth conditions are tested which allow the building and progressive structuring, in successive steps, of the whole lexical entry. We also started to design the representation of extracted and structured entries in a standard format, encoded in TEI. An outline of an example entry is also provided and illustrated in order to show what the end result will look like. -
dc.description.affiliations Istituto di Linguistica Computazionale "A. Zampolli" - CNR (Pisa, Italy), Accademia della Crusca (Firenze, Italy), Università degli Studi di Firenze (Italy) -
dc.description.allpeople Sassolini, Eva; Fahad Khan, Anas; Biffi, Marco; Monachini, Monica; Montemagni, Simonetta -
dc.description.allpeopleoriginal Eva Sassolini, Anas Fahad Khan, Marco Biffi, Monica Monachini and Simonetta Montemagni: -
dc.description.fulltext none en
dc.description.numberofauthors 5 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/389211 -
dc.identifier.url https://elex.link/elex2019/wp-content/uploads/2019/09/eLex_2019_35.pdf -
dc.language.iso eng -
dc.relation.conferencedate 1-3/10/2019 -
dc.relation.conferencename Electronic lexicography in the 21st century (eLex 2019): Smart Lexicography. -
dc.subject.keywords historical dictionaries; automatic acquisition; TEI representation -
dc.subject.singlekeyword historical dictionaries *
dc.subject.singlekeyword automatic acquisition *
dc.subject.singlekeyword TEI representation *
dc.title Converting and structuring a digital historical dictionary of Italian: a case study en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.ugov.descaux1 410154 -
iris.orcid.lastModifiedDate 2024/04/04 12:03:41 *
iris.orcid.lastModifiedMillisecond 1712225021649 *
iris.scopus.extIssued 2019 -
iris.scopus.extTitle Converting and structuring a digital historical dictionary of Italian: A case study -
iris.sitodocente.maxattempts 1 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/389211
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact