<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/CINECAstyle.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-06-15T02:27:09Z</responseDate><request verb="GetRecord" identifier="oai:iris.cnr.it:20.500.14243/570461" metadataPrefix="oai_dc">https://iris.cnr.it/oai/request</request><GetRecord><record><header><identifier>oai:iris.cnr.it:20.500.14243/570461</identifier><datestamp>2026-03-04T01:28:12Z</datestamp><setSpec>com_20.500.14243_46</setSpec><setSpec>com_20.500.14243_21</setSpec><setSpec>col_20.500.14243_47</setSpec><setSpec>ou_ou239</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Beyond the Spelling Miracle: Investigating Substring Awareness in Character-Blind Language Models</dc:title>
<dc:creator>Ciaccio C.</dc:creator>
<dc:creator>Sartor M.</dc:creator>
<dc:creator>Miaschi A.</dc:creator>
<dc:creator>Dell'Orletta F.</dc:creator>
<dc:contributor>Ciaccio, C.</dc:contributor>
<dc:contributor> Sartor, M.</dc:contributor>
<dc:contributor> Miaschi, A.</dc:contributor>
<dc:contributor> Dell'Orletta, F.</dc:contributor>
<dc:subject>Large Language Models (LLMs)</dc:subject>
<dc:subject>Interpretability</dc:subject>
<dc:description>Correctly identifying characters and substrings of words should be a basic but essential ability of any Language Model that aims to proficiently understand and produce language. Despite so, the majority of Pre-trained Language Models (PLMs) are "character-blind" and struggle in spelling tasks, although they still seem to acquire some character knowledge during pre-training, a phenomenon dubbed Spelling Miracle. To shed light on this phenomenon, we systematically evaluate a range of PLMs with different parameter sizes using a controlled binary substring identification task. Through a series of experiments, we propose the first comprehensive investigation on where, when, and how PLMs develop awareness of characters and substrings, with a particular linguistic focus on morphemic units such as prefixes, suffixes, and roots.</dc:description>
<dc:date>2025</dc:date>
<dc:type>info:eu-repo/semantics/conferenceObject</dc:type>
<dc:identifier>https://hdl.handle.net/20.500.14243/570461</dc:identifier>
<dc:identifier>10.18653/v1/2025.findings-acl.593</dc:identifier>
<dc:identifier>info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-105028561206</dc:identifier>
<dc:language>eng</dc:language>
<dc:relation>ispartofbook:Proceedings of the Annual Meeting of the Association for Computational Linguistics</dc:relation>
<dc:relation>63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025</dc:relation>
<dc:relation>firstpage:11361</dc:relation>
<dc:relation>lastpage:11372</dc:relation>
<dc:relation>numberofpages:12</dc:relation>
<dc:relation>serie:PROCEEDINGS OF THE CONFERENCE - ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. MEETING</dc:relation>
<dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
<dc:publisher>Association for Computational Linguistics (ACL)</dc:publisher>
<dc:rights>license:Creative commons</dc:rights>
<dc:rights>license uri:http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
</oai_dc:dc></metadata></record></GetRecord></OAI-PMH>