The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and typologically differentiated corpus built for the "Vocabolario Dinamico dell’Italiano Moderno" (VoDIM). For what concerns POS tagging, the effectiveness of retrained models is illustrated and substantiated with quantitative data, with a specific view to linguistic annotation results obtained with respect to specific language evolution stages, domains and textual genres. For lemmatization, different customized models have been developed, including lexicon-assisted ones and models retrained with historical annotated texts. In both cases, a detailed error analysis is provided.
POS tagging and lemmatization of historical varieties of languages. The challenge of old Italian
Manuel Favaro;Simonetta Montemagni
2023
Abstract
The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and typologically differentiated corpus built for the "Vocabolario Dinamico dell’Italiano Moderno" (VoDIM). For what concerns POS tagging, the effectiveness of retrained models is illustrated and substantiated with quantitative data, with a specific view to linguistic annotation results obtained with respect to specific language evolution stages, domains and textual genres. For lemmatization, different customized models have been developed, including lexicon-assisted ones and models retrained with historical annotated texts. In both cases, a detailed error analysis is provided.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.ancejournal | IJCOL | en |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Manuel Favaro | en |
| dc.authority.people | Marco Biffi | en |
| dc.authority.people | Simonetta Montemagni | en |
| dc.authority.project | PRR.AP019.006 | en |
| dc.collection.id.s | b3f88f24-048a-4e43-8ab1-6697b90e068e | * |
| dc.collection.name | 01.01 Articolo in rivista | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2025/01/20 16:57:44 | - |
| dc.date.available | 2025/01/20 16:57:44 | - |
| dc.date.firstsubmission | 2025/01/15 16:38:22 | * |
| dc.date.issued | 2023 | - |
| dc.date.submission | 2025/01/15 16:38:22 | * |
| dc.description.abstracteng | The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and typologically differentiated corpus built for the "Vocabolario Dinamico dell’Italiano Moderno" (VoDIM). For what concerns POS tagging, the effectiveness of retrained models is illustrated and substantiated with quantitative data, with a specific view to linguistic annotation results obtained with respect to specific language evolution stages, domains and textual genres. For lemmatization, different customized models have been developed, including lexicon-assisted ones and models retrained with historical annotated texts. In both cases, a detailed error analysis is provided. | - |
| dc.description.allpeople | Favaro, Manuel; Biffi, Marco; Montemagni, Simonetta | - |
| dc.description.allpeopleoriginal | Manuel Favaro, Marco Biffi, Simonetta Montemagni | en |
| dc.description.fulltext | open | en |
| dc.description.numberofauthors | 3 | - |
| dc.identifier.doi | 10.4000/ijcol.1325 | en |
| dc.identifier.scopus | 2-s2.0-85206124394 | en |
| dc.identifier.source | manual | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/526790 | - |
| dc.identifier.url | https://journals.openedition.org/ijcol/1325 | en |
| dc.identifier.url | https://www.aaccademia.it/ita/titolo?ref=1676 | en |
| dc.language.iso | eng | en |
| dc.relation.firstpage | 99 | en |
| dc.relation.issue | 2 | en |
| dc.relation.lastpage | 120 | en |
| dc.relation.medium | ELETTRONICO | en |
| dc.relation.numberofpages | 22 | en |
| dc.relation.projectAcronym | CHANGES_Spoke_3 | en |
| dc.relation.projectAwardNumber | - | en |
| dc.relation.projectAwardTitle | Cultural Heritage Active Innovation for Sustainable Society (CHANGES) | en |
| dc.relation.projectFunderName | - | en |
| dc.relation.projectFundingStream | - | en |
| dc.relation.volume | 9 | en |
| dc.subject.keywordseng | Historical Varieties of Italian, POS-Tagging, Lemmatization | - |
| dc.subject.singlekeyword | Historical Varieties of Italian | * |
| dc.subject.singlekeyword | POS-Tagging | * |
| dc.subject.singlekeyword | Lemmatization | * |
| dc.title | POS tagging and lemmatization of historical varieties of languages. The challenge of old Italian | en |
| dc.type.circulation | Internazionale | en |
| dc.type.driver | info:eu-repo/semantics/article | - |
| dc.type.full | 01 Contributo su Rivista::01.01 Articolo in rivista | it |
| dc.type.impactfactor | si | en |
| dc.type.miur | 262 | - |
| dc.type.referee | Esperti anonimi | en |
| iris.mediafilter.data | 2025/04/12 03:16:21 | * |
| iris.orcid.lastModifiedDate | 2025/02/05 10:39:46 | * |
| iris.orcid.lastModifiedMillisecond | 1738748386443 | * |
| iris.scopus.extIssued | 2023 | - |
| iris.scopus.extTitle | POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian | - |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.bestoahost | publisher | * |
| iris.unpaywall.bestoaversion | publishedVersion | * |
| iris.unpaywall.doi | 10.4000/ijcol.1325 | * |
| iris.unpaywall.hosttype | publisher | * |
| iris.unpaywall.isoa | true | * |
| iris.unpaywall.journalisindoaj | true | * |
| iris.unpaywall.landingpage | https://doi.org/10.4000/ijcol.1325 | * |
| iris.unpaywall.license | cc-by-nc-nd | * |
| iris.unpaywall.metadataCallLastModified | 11/02/2026 04:00:58 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1770778858303 | - |
| iris.unpaywall.oastatus | gold | * |
| iris.unpaywall.pdfurl | https://journals.openedition.org/ijcol/pdf/1325 | * |
| scopus.authority.ancejournal | IJCOL###2499-4553 | * |
| scopus.category | 1709 | * |
| scopus.category | 3310 | * |
| scopus.category | 1703 | * |
| scopus.category | 1702 | * |
| scopus.contributor.affiliation | CNR | - |
| scopus.contributor.affiliation | Accademia della Crusca | - |
| scopus.contributor.affiliation | CNR | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 116691777 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.auid | 57220923328 | - |
| scopus.contributor.auid | 57211925643 | - |
| scopus.contributor.auid | 15056781100 | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.name | Manuel | - |
| scopus.contributor.name | Marco | - |
| scopus.contributor.name | Simonetta | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale “Antonio Zampolli”; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale “Antonio Zampolli”; | - |
| scopus.contributor.surname | Favaro | - |
| scopus.contributor.surname | Biffi | - |
| scopus.contributor.surname | Montemagni | - |
| scopus.date.issued | 2023 | * |
| scopus.description.abstracteng | The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and typologically differentiated corpus built for the "Vocabolario Dinamico dell’Italiano Moderno" (VoDIM). For what concerns POS tagging, the effectiveness of retrained models is illustrated and substantiated with quantitative data, with a specific view to linguistic annotation results obtained with respect to specific language evolution stages, domains and textual genres. For lemmatization, different customized models have been developed, including lexicon-assisted ones and models retrained with historical annotated texts. In both cases, a detailed error analysis is provided. | * |
| scopus.description.allpeopleoriginal | Favaro M.; Biffi M.; Montemagni S. | * |
| scopus.differences | scopus.description.allpeopleoriginal | * |
| scopus.document.type | ar | * |
| scopus.document.types | ar | * |
| scopus.funding.funders | 501100009888 - Regione Toscana; | * |
| scopus.funding.ids | POR FSE 2014 - 2020; | * |
| scopus.identifier.doi | 10.4000/ijcol.1325 | * |
| scopus.identifier.eissn | 2499-4553 | * |
| scopus.identifier.pui | 2031365048 | * |
| scopus.identifier.scopus | 2-s2.0-85206124394 | * |
| scopus.journal.sourceid | 21101252471 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | Accademia University Press | * |
| scopus.relation.firstpage | 99 | * |
| scopus.relation.issue | 2 | * |
| scopus.relation.lastpage | 120 | * |
| scopus.relation.volume | 9 | * |
| scopus.title | POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian | * |
| scopus.titleeng | POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian | * |
| Appare nelle tipologie: | 01.01 Articolo in rivista | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
IJCoL_9_2_05_Favaro-Biffi-Montemagni.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
193.78 kB
Formato
Adobe PDF
|
193.78 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


