CNR Institutional Research Information System

Recent progress in Large Language Models (LLMs) has led to impressive capabilities in Natural Language Generation (NLG). However, standard evaluation benchmarks often focus on surface-level performance and are predominantly English-centric, limiting insights into models’ deeper linguistic competences, especially in other languages. In this paper, we introduce OuLiBench, a novel benchmark inspired by the literary movement OuLiPo, designed to evaluate LLMs’ ability to generate Italian text under explicit linguistic constraints, ranging from morpho-syntactic requirements to creative and structural challenges. Our goal is to assess the extent to which LLMs can understand and manipulate language when guided by specific, sometimes artificial constraints. We evaluate a range of state-of-the-art models in both zero-and few-shot settings, comparing performance across constraint types and difficulty levels. Our results highlight significant variability across models and tasks, shedding light on the limits of controllable text generation and offering a new lens for probing LLMs’ generative and linguistic competence beyond traditional benchmarks.

The OuLiBench Benchmark: Formal Constraints as a Lens into LLM Linguistic Competence

Silvio Calderaro;Alessio Miaschi;Felice Dell’Orletta

2025

Abstract

Recent progress in Large Language Models (LLMs) has led to impressive capabilities in Natural Language Generation (NLG). However, standard evaluation benchmarks often focus on surface-level performance and are predominantly English-centric, limiting insights into models’ deeper linguistic competences, especially in other languages. In this paper, we introduce OuLiBench, a novel benchmark inspired by the literary movement OuLiPo, designed to evaluate LLMs’ ability to generate Italian text under explicit linguistic constraints, ranging from morpho-syntactic requirements to creative and structural challenges. Our goal is to assess the extent to which LLMs can understand and manipulate language when guided by specific, sometimes artificial constraints. We evaluate a range of state-of-the-art models in both zero-and few-shot settings, comparing performance across constraint types and difficulty levels. Our results highlight significant variability across models and tasks, shedding light on the limits of controllable text generation and offering a new lens for probing LLMs’ generative and linguistic competence beyond traditional benchmarks.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Silvio Calderaro	en
dc.authority.people	Alessio Miaschi	en
dc.authority.people	Felice Dell’Orletta	en
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.date.accessioned	2026/03/03 16:54:21	-
dc.date.available	2026/03/03 16:54:21	-
dc.date.firstsubmission	2026/03/03 15:49:02	*
dc.date.issued	2025	-
dc.date.submission	2026/03/03 15:49:02	*
dc.description.abstracteng	Recent progress in Large Language Models (LLMs) has led to impressive capabilities in Natural Language Generation (NLG). However, standard evaluation benchmarks often focus on surface-level performance and are predominantly English-centric, limiting insights into models’ deeper linguistic competences, especially in other languages. In this paper, we introduce OuLiBench, a novel benchmark inspired by the literary movement OuLiPo, designed to evaluate LLMs’ ability to generate Italian text under explicit linguistic constraints, ranging from morpho-syntactic requirements to creative and structural challenges. Our goal is to assess the extent to which LLMs can understand and manipulate language when guided by specific, sometimes artificial constraints. We evaluate a range of state-of-the-art models in both zero-and few-shot settings, comparing performance across constraint types and difficulty levels. Our results highlight significant variability across models and tasks, shedding light on the limits of controllable text generation and offering a new lens for probing LLMs’ generative and linguistic competence beyond traditional benchmarks.	-
dc.description.allpeople	Calderaro, Silvio; Miaschi, Alessio; Dell’Orletta, Felice	-
dc.description.allpeopleoriginal	Silvio Calderaro, Alessio Miaschi, Felice Dell’Orletta	en
dc.description.fulltext	open	en
dc.description.numberofauthors	3	-
dc.identifier.source	manual	*
dc.identifier.uri	https://hdl.handle.net/20.500.14243/570746	-
dc.language.iso	eng	en
dc.relation.ispartofbook	Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)	en
dc.subject.keywordseng	Large Language Models, Benchmark, Evaluation, Controllable Text Generation	-
dc.subject.singlekeyword	Large Language Models	*
dc.subject.singlekeyword	Benchmark	*
dc.subject.singlekeyword	Evaluation	*
dc.subject.singlekeyword	Controllable Text Generation	*
dc.title	The OuLiBench Benchmark: Formal Constraints as a Lens into LLM Linguistic Competence	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
iris.mediafilter.data	2026/03/04 02:52:26	*
iris.orcid.lastModifiedDate	2026/03/03 16:54:21	*
iris.orcid.lastModifiedMillisecond	1772553261221	*
iris.sitodocente.maxattempts	1	-
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
ulibench.pdf accesso aperto Licenza: Creative commons Dimensione 1.25 MB Formato Adobe PDF Visualizza/Apri	1.25 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570746

Citazioni

ND

ND

ND

social impact