Recent progress in Large Language Models (LLMs) has led to impressive capabilities in Natural Language Generation (NLG). However, standard evaluation benchmarks often focus on surface-level performance and are predominantly English-centric, limiting insights into models’ deeper linguistic competences, especially in other languages. In this paper, we introduce OuLiBench, a novel benchmark inspired by the literary movement OuLiPo, designed to evaluate LLMs’ ability to generate Italian text under explicit linguistic constraints, ranging from morpho-syntactic requirements to creative and structural challenges. Our goal is to assess the extent to which LLMs can understand and manipulate language when guided by specific, sometimes artificial constraints. We evaluate a range of state-of-the-art models in both zero-and few-shot settings, comparing performance across constraint types and difficulty levels. Our results highlight significant variability across models and tasks, shedding light on the limits of controllable text generation and offering a new lens for probing LLMs’ generative and linguistic competence beyond traditional benchmarks.

The OuLiBench Benchmark: Formal Constraints as a Lens into LLM Linguistic Competence

Alessio Miaschi;Felice Dell’Orletta
2025

Abstract

Recent progress in Large Language Models (LLMs) has led to impressive capabilities in Natural Language Generation (NLG). However, standard evaluation benchmarks often focus on surface-level performance and are predominantly English-centric, limiting insights into models’ deeper linguistic competences, especially in other languages. In this paper, we introduce OuLiBench, a novel benchmark inspired by the literary movement OuLiPo, designed to evaluate LLMs’ ability to generate Italian text under explicit linguistic constraints, ranging from morpho-syntactic requirements to creative and structural challenges. Our goal is to assess the extent to which LLMs can understand and manipulate language when guided by specific, sometimes artificial constraints. We evaluate a range of state-of-the-art models in both zero-and few-shot settings, comparing performance across constraint types and difficulty levels. Our results highlight significant variability across models and tasks, shedding light on the limits of controllable text generation and offering a new lens for probing LLMs’ generative and linguistic competence beyond traditional benchmarks.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Silvio Calderaro en
dc.authority.people Alessio Miaschi en
dc.authority.people Felice Dell’Orletta en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2026/03/03 16:54:21 -
dc.date.available 2026/03/03 16:54:21 -
dc.date.firstsubmission 2026/03/03 15:49:02 *
dc.date.issued 2025 -
dc.date.submission 2026/03/03 15:49:02 *
dc.description.abstracteng Recent progress in Large Language Models (LLMs) has led to impressive capabilities in Natural Language Generation (NLG). However, standard evaluation benchmarks often focus on surface-level performance and are predominantly English-centric, limiting insights into models’ deeper linguistic competences, especially in other languages. In this paper, we introduce OuLiBench, a novel benchmark inspired by the literary movement OuLiPo, designed to evaluate LLMs’ ability to generate Italian text under explicit linguistic constraints, ranging from morpho-syntactic requirements to creative and structural challenges. Our goal is to assess the extent to which LLMs can understand and manipulate language when guided by specific, sometimes artificial constraints. We evaluate a range of state-of-the-art models in both zero-and few-shot settings, comparing performance across constraint types and difficulty levels. Our results highlight significant variability across models and tasks, shedding light on the limits of controllable text generation and offering a new lens for probing LLMs’ generative and linguistic competence beyond traditional benchmarks. -
dc.description.allpeople Calderaro, Silvio; Miaschi, Alessio; Dell’Orletta, Felice -
dc.description.allpeopleoriginal Silvio Calderaro, Alessio Miaschi, Felice Dell’Orletta en
dc.description.fulltext open en
dc.description.numberofauthors 3 -
dc.identifier.source manual *
dc.identifier.uri https://hdl.handle.net/20.500.14243/570746 -
dc.language.iso eng en
dc.relation.ispartofbook Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) en
dc.subject.keywordseng Large Language Models, Benchmark, Evaluation, Controllable Text Generation -
dc.subject.singlekeyword Large Language Models *
dc.subject.singlekeyword Benchmark *
dc.subject.singlekeyword Evaluation *
dc.subject.singlekeyword Controllable Text Generation *
dc.title The OuLiBench Benchmark: Formal Constraints as a Lens into LLM Linguistic Competence en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
iris.mediafilter.data 2026/03/04 02:52:26 *
iris.orcid.lastModifiedDate 2026/03/03 16:54:21 *
iris.orcid.lastModifiedMillisecond 1772553261221 *
iris.sitodocente.maxattempts 1 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
ulibench.pdf

accesso aperto

Licenza: Creative commons
Dimensione 1.25 MB
Formato Adobe PDF
1.25 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570746
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact