CNR Institutional Research Information System

Correctly identifying characters and substrings of words should be a basic but essential ability of any Language Model that aims to proficiently understand and produce language. Despite so, the majority of Pre-trained Language Models (PLMs) are "character-blind" and struggle in spelling tasks, although they still seem to acquire some character knowledge during pre-training, a phenomenon dubbed Spelling Miracle. To shed light on this phenomenon, we systematically evaluate a range of PLMs with different parameter sizes using a controlled binary substring identification task. Through a series of experiments, we propose the first comprehensive investigation on where, when, and how PLMs develop awareness of characters and substrings, with a particular linguistic focus on morphemic units such as prefixes, suffixes, and roots.

Beyond the Spelling Miracle: Investigating Substring Awareness in Character-Blind Language Models

Ciaccio C.;Sartor M.;Miaschi A.;Dell'Orletta F.

2025

Abstract

Correctly identifying characters and substrings of words should be a basic but essential ability of any Language Model that aims to proficiently understand and produce language. Despite so, the majority of Pre-trained Language Models (PLMs) are "character-blind" and struggle in spelling tasks, although they still seem to acquire some character knowledge during pre-training, a phenomenon dubbed Spelling Miracle. To shed light on this phenomenon, we systematically evaluate a range of PLMs with different parameter sizes using a controlled binary substring identification task. Through a series of experiments, we propose the first comprehensive investigation on where, when, and how PLMs develop awareness of characters and substrings, with a particular linguistic focus on morphemic units such as prefixes, suffixes, and roots.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di linguistica computazionale "Antonio Zampolli" - ILC
			
	Parole chiave
	
				Large Language Models (LLMs)
Interpretability
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2025.findings-acl.593.pdf accesso aperto Licenza: Creative commons Dimensione 2.68 MB Formato Adobe PDF Visualizza/Apri	2.68 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570461

Citazioni

ND

3

ND

social impact