<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/CINECAstyle.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-06-06T02:36:37Z</responseDate><request verb="GetRecord" identifier="oai:iris.cnr.it:20.500.14243/571223" metadataPrefix="oai_dc">https://iris.cnr.it/oai/request</request><GetRecord><record><header><identifier>oai:iris.cnr.it:20.500.14243/571223</identifier><datestamp>2026-03-05T01:37:05Z</datestamp><setSpec>ou_ou239</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Low- vs High-level Lemmatization for Historical Languages. A Case study on Italian</dc:title>
<dc:creator>Chiara Alzetta</dc:creator>
<dc:creator>Simonetta Montemagni</dc:creator>
<dc:contributor>Alzetta, Chiara</dc:contributor>
<dc:contributor> Montemagni, Simonetta</dc:contributor>
<dc:subject>Data-driven Lemmatization, Historical Italian, Universal Dependencies, Normalization</dc:subject>
<dc:description>Lemmatization remains a foundational yet challenging task in the processing of historical Italian texts, due to the complex interplay of orthographic, morphological, and diatopic variation. A crucial, yet often overlooked, aspect is the degree of normalization applied during lemmatization. A conservative approach preserves attested historical forms, ensuring greater linguistic fidelity but increasing data sparsity. Conversely, an abstract normalization strategy aligns historical variants with standardized contemporary lemmas, improving generalization but potentially introducing inaccurate mappings. In this paper, we present a comparative evaluation of conservative and normalized lemmatization strategies for historical Italian. To our knowledge, this is the first study to explicitly assess the impact of lemmatization strategies in the context of historical languages, particularly those that are morphologically rich. Our results indicate that high-level normalization offers a promising trade-off between precision and generalization.</dc:description>
<dc:date>2025</dc:date>
<dc:type>info:eu-repo/semantics/conferenceObject</dc:type>
<dc:identifier>https://hdl.handle.net/20.500.14243/571223</dc:identifier>
<dc:relation>info:eu-repo/semantics/altIdentifier/isbn/979-12-243-0587-3</dc:relation>
<dc:identifier>https://aclanthology.org/2025.clicit-1.4.pdf</dc:identifier>
<dc:language>eng</dc:language>
<dc:relation>ispartofbook:Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)</dc:relation>
<dc:relation>Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)</dc:relation>
<dc:relation>numberofpages:10</dc:relation>
<dc:format>ELETTRONICO</dc:format>
<dc:publisher>CEUR Workshop Proceeding</dc:publisher>
</oai_dc:dc></metadata></record></GetRecord></OAI-PMH>