CNR Institutional Research Information System

This survey provides an overview of the challenges of misspellings in natural language processing (NLP). Misspellings are ubiquitous in digital communication, and even if humans can generally interpret misspelt text, NLP models frequently struggle to handle it: this causes a decline in performance in common tasks like text classification and machine translation. In this paper, we reconstruct a history of misspellings as a scientific problem. We then discuss the latest advancements to address the challenge of misspellings in NLP. Main strategies to mitigate the effect of misspellings include data augmentation, double step, character-order agnostic, and tuple-based methods, among others. This survey also examines dedicated data challenges and competitions to spur progress in the field. Critical safety and ethical concerns are also examined, for example, the voluntary use of misspellings to inject malicious messages and hate speech on social networks. The survey also explores psycholinguistic perspectives on how humans process misspellings, potentially informing innovative computational techniques for text normalisation and representation. Additionally, the survey explores the challenges that misspellings pose in multilingual contexts. Finally, the misspelling-related challenges and opportunities associated with modern large language models are also analysed, including benchmarks, datasets and performances of the most prominent language models against misspellings. This survey provides a comprehensive review of recent research on misspellings and aims to serve as a valuable resource for researchers seeking to get up to speed on this problem within the rapidly evolving landscape of NLP.

Misspellings in natural language processing: a survey of recent literature

Sperduti Gianluca^{Writing – Original Draft Preparation};Moreo Alejandro^Supervision

2026

Abstract

This survey provides an overview of the challenges of misspellings in natural language processing (NLP). Misspellings are ubiquitous in digital communication, and even if humans can generally interpret misspelt text, NLP models frequently struggle to handle it: this causes a decline in performance in common tasks like text classification and machine translation. In this paper, we reconstruct a history of misspellings as a scientific problem. We then discuss the latest advancements to address the challenge of misspellings in NLP. Main strategies to mitigate the effect of misspellings include data augmentation, double step, character-order agnostic, and tuple-based methods, among others. This survey also examines dedicated data challenges and competitions to spur progress in the field. Critical safety and ethical concerns are also examined, for example, the voluntary use of misspellings to inject malicious messages and hate speech on social networks. The survey also explores psycholinguistic perspectives on how humans process misspellings, potentially informing innovative computational techniques for text normalisation and representation. Additionally, the survey explores the challenges that misspellings pose in multilingual contexts. Finally, the misspelling-related challenges and opportunities associated with modern large language models are also analysed, including benchmarks, datasets and performances of the most prominent language models against misspellings. This survey provides a comprehensive review of recent research on misspellings and aims to serve as a valuable resource for researchers seeking to get up to speed on this problem within the rapidly evolving landscape of NLP.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Misspellings
Text normalisation
User-generated content
Data augmentation
Hate speech detection
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
MisspellingsSurvey.NLP2026.pdf accesso aperto Descrizione: Misspellings in natural language processing: A survey of recent literature Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 765.05 kB Formato Adobe PDF Visualizza/Apri	765.05 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/573684

Citazioni

ND

0

ND

social impact