The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and typologically differentiated corpus built for the "Vocabolario Dinamico dell’Italiano Moderno" (VoDIM). For what concerns POS tagging, the effectiveness of retrained models is illustrated and substantiated with quantitative data, with a specific view to linguistic annotation results obtained with respect to specific language evolution stages, domains and textual genres. For lemmatization, different customized models have been developed, including lexicon-assisted ones and models retrained with historical annotated texts. In both cases, a detailed error analysis is provided.

POS tagging and lemmatization of historical varieties of languages. The challenge of old Italian

Manuel Favaro;Simonetta Montemagni
2023

Abstract

The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and typologically differentiated corpus built for the "Vocabolario Dinamico dell’Italiano Moderno" (VoDIM). For what concerns POS tagging, the effectiveness of retrained models is illustrated and substantiated with quantitative data, with a specific view to linguistic annotation results obtained with respect to specific language evolution stages, domains and textual genres. For lemmatization, different customized models have been developed, including lexicon-assisted ones and models retrained with historical annotated texts. In both cases, a detailed error analysis is provided.
Campo DC Valore Lingua
dc.authority.ancejournal IJCOL en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Manuel Favaro en
dc.authority.people Marco Biffi en
dc.authority.people Simonetta Montemagni en
dc.authority.project PRR.AP019.006 en
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2025/01/20 16:57:44 -
dc.date.available 2025/01/20 16:57:44 -
dc.date.firstsubmission 2025/01/15 16:38:22 *
dc.date.issued 2023 -
dc.date.submission 2025/01/15 16:38:22 *
dc.description.abstracteng The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and typologically differentiated corpus built for the "Vocabolario Dinamico dell’Italiano Moderno" (VoDIM). For what concerns POS tagging, the effectiveness of retrained models is illustrated and substantiated with quantitative data, with a specific view to linguistic annotation results obtained with respect to specific language evolution stages, domains and textual genres. For lemmatization, different customized models have been developed, including lexicon-assisted ones and models retrained with historical annotated texts. In both cases, a detailed error analysis is provided. -
dc.description.allpeople Favaro, Manuel; Biffi, Marco; Montemagni, Simonetta -
dc.description.allpeopleoriginal Manuel Favaro, Marco Biffi, Simonetta Montemagni en
dc.description.fulltext open en
dc.description.numberofauthors 3 -
dc.identifier.doi 10.4000/ijcol.1325 en
dc.identifier.scopus 2-s2.0-85206124394 en
dc.identifier.source manual *
dc.identifier.uri https://hdl.handle.net/20.500.14243/526790 -
dc.identifier.url https://journals.openedition.org/ijcol/1325 en
dc.identifier.url https://www.aaccademia.it/ita/titolo?ref=1676 en
dc.language.iso eng en
dc.relation.firstpage 99 en
dc.relation.issue 2 en
dc.relation.lastpage 120 en
dc.relation.medium ELETTRONICO en
dc.relation.numberofpages 22 en
dc.relation.projectAcronym CHANGES_Spoke_3 en
dc.relation.projectAwardNumber - en
dc.relation.projectAwardTitle Cultural Heritage Active Innovation for Sustainable Society (CHANGES) en
dc.relation.projectFunderName - en
dc.relation.projectFundingStream - en
dc.relation.volume 9 en
dc.subject.keywordseng Historical Varieties of Italian, POS-Tagging, Lemmatization -
dc.subject.singlekeyword Historical Varieties of Italian *
dc.subject.singlekeyword POS-Tagging *
dc.subject.singlekeyword Lemmatization *
dc.title POS tagging and lemmatization of historical varieties of languages. The challenge of old Italian en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.impactfactor si en
dc.type.miur 262 -
dc.type.referee Esperti anonimi en
iris.mediafilter.data 2025/04/12 03:16:21 *
iris.orcid.lastModifiedDate 2025/02/05 10:39:46 *
iris.orcid.lastModifiedMillisecond 1738748386443 *
iris.scopus.extIssued 2023 -
iris.scopus.extTitle POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.bestoahost publisher *
iris.unpaywall.bestoaversion publishedVersion *
iris.unpaywall.doi 10.4000/ijcol.1325 *
iris.unpaywall.hosttype publisher *
iris.unpaywall.isoa true *
iris.unpaywall.journalisindoaj true *
iris.unpaywall.landingpage https://doi.org/10.4000/ijcol.1325 *
iris.unpaywall.license cc-by-nc-nd *
iris.unpaywall.metadataCallLastModified 11/02/2026 04:00:58 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1770778858303 -
iris.unpaywall.oastatus gold *
iris.unpaywall.pdfurl https://journals.openedition.org/ijcol/pdf/1325 *
scopus.authority.ancejournal IJCOL###2499-4553 *
scopus.category 1709 *
scopus.category 3310 *
scopus.category 1703 *
scopus.category 1702 *
scopus.contributor.affiliation CNR -
scopus.contributor.affiliation Accademia della Crusca -
scopus.contributor.affiliation CNR -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 116691777 -
scopus.contributor.afid 60008941 -
scopus.contributor.auid 57220923328 -
scopus.contributor.auid 57211925643 -
scopus.contributor.auid 15056781100 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Manuel -
scopus.contributor.name Marco -
scopus.contributor.name Simonetta -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale “Antonio Zampolli”; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale “Antonio Zampolli”; -
scopus.contributor.surname Favaro -
scopus.contributor.surname Biffi -
scopus.contributor.surname Montemagni -
scopus.date.issued 2023 *
scopus.description.abstracteng The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and typologically differentiated corpus built for the "Vocabolario Dinamico dell’Italiano Moderno" (VoDIM). For what concerns POS tagging, the effectiveness of retrained models is illustrated and substantiated with quantitative data, with a specific view to linguistic annotation results obtained with respect to specific language evolution stages, domains and textual genres. For lemmatization, different customized models have been developed, including lexicon-assisted ones and models retrained with historical annotated texts. In both cases, a detailed error analysis is provided. *
scopus.description.allpeopleoriginal Favaro M.; Biffi M.; Montemagni S. *
scopus.differences scopus.description.allpeopleoriginal *
scopus.document.type ar *
scopus.document.types ar *
scopus.funding.funders 501100009888 - Regione Toscana; *
scopus.funding.ids POR FSE 2014 - 2020; *
scopus.identifier.doi 10.4000/ijcol.1325 *
scopus.identifier.eissn 2499-4553 *
scopus.identifier.pui 2031365048 *
scopus.identifier.scopus 2-s2.0-85206124394 *
scopus.journal.sourceid 21101252471 *
scopus.language.iso eng *
scopus.publisher.name Accademia University Press *
scopus.relation.firstpage 99 *
scopus.relation.issue 2 *
scopus.relation.lastpage 120 *
scopus.relation.volume 9 *
scopus.title POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian *
scopus.titleeng POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian *
Appare nelle tipologie: 01.01 Articolo in rivista
File in questo prodotto:
File Dimensione Formato  
IJCoL_9_2_05_Favaro-Biffi-Montemagni.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 193.78 kB
Formato Adobe PDF
193.78 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/526790
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact