CNR Institutional Research Information System

Recent advances in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, yet controlling model outputs remains a challenge. In this study, we explore the use of LLMs to generate high-quality synthetic data for Automatic Text Simplification (ATS), evaluating the ability of models fine-tuned on Italian to produce multiple simplified versions of the same original sentence that vary in readability and in their lexical and (morpho-)syntactic characteristics. The approach is tested across two domains, Wikipedia and Public Administration, allowing us to explore domain sensitivity. Additionally, we compare the linguistic phenomena observed in the generated data with those found in ATS resources previously created through manual or semi-automatic methods. Our results suggest that the best-performing LLM can generate linguistically diverse simplifications that align with known simplification patterns, offering a promising direction for building reliable ATS resources, including simplifications suited to varying levels of reader proficiency.

Generating and Evaluating Multi-Level Text Simplification: A Case Study on Italian

Michele Papucci;Giulia Venturi;Felice Dell'Orletta

2025

Abstract

Recent advances in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, yet controlling model outputs remains a challenge. In this study, we explore the use of LLMs to generate high-quality synthetic data for Automatic Text Simplification (ATS), evaluating the ability of models fine-tuned on Italian to produce multiple simplified versions of the same original sentence that vary in readability and in their lexical and (morpho-)syntactic characteristics. The approach is tested across two domains, Wikipedia and Public Administration, allowing us to explore domain sensitivity. Additionally, we compare the linguistic phenomena observed in the generated data with those found in ATS resources previously created through manual or semi-automatic methods. Our results suggest that the best-performing LLM can generate linguistically diverse simplifications that align with known simplification patterns, offering a promising direction for building reliable ATS resources, including simplifications suited to varying levels of reader proficiency.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Michele Papucci	en
dc.authority.people	Giulia Venturi	en
dc.authority.people	Felice Dell'Orletta	en
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.date.firstsubmission	2026/03/03 19:39:30	*
dc.date.issued	2025	-
dc.date.submission	2026/03/03 19:39:30	*
dc.description.abstracteng	Recent advances in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, yet controlling model outputs remains a challenge. In this study, we explore the use of LLMs to generate high-quality synthetic data for Automatic Text Simplification (ATS), evaluating the ability of models fine-tuned on Italian to produce multiple simplified versions of the same original sentence that vary in readability and in their lexical and (morpho-)syntactic characteristics. The approach is tested across two domains, Wikipedia and Public Administration, allowing us to explore domain sensitivity. Additionally, we compare the linguistic phenomena observed in the generated data with those found in ATS resources previously created through manual or semi-automatic methods. Our results suggest that the best-performing LLM can generate linguistically diverse simplifications that align with known simplification patterns, offering a promising direction for building reliable ATS resources, including simplifications suited to varying levels of reader proficiency.	-
dc.description.allpeople	Papucci, Michele; Venturi, Giulia; Dell'Orletta, Felice	-
dc.description.allpeopleoriginal	Michele Papucci, Giulia Venturi, Felice Dell'Orletta	en
dc.description.fulltext	none	en
dc.description.numberofauthors	3	-
dc.identifier.isbn	979-12-243-0587-3	en
dc.identifier.source	manual	*
dc.identifier.uri	https://hdl.handle.net/20.500.14243/570801	-
dc.identifier.url	https://aclanthology.org/2025.clicit-1.82/	en
dc.language.iso	eng	en
dc.miur.last.status.update	2026-03-03T18:40:06Z	*
dc.publisher.name	CEUR Workshop Proceedings	en
dc.relation.allauthors	Michele Papucci, Giulia Venturi, Felice Dell’Orletta	en
dc.relation.conferencename	Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)	en
dc.relation.firstpage	870	en
dc.relation.ispartofbook	Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)	en
dc.relation.lastpage	885	en
dc.subject.keywordseng	Automatic Text Simplification, Large Language Models, Synthetic Data, Linguistic Complexity, Sentence Readability	-
dc.subject.singlekeyword	Automatic Text Simplification	*
dc.subject.singlekeyword	Large Language Models	*
dc.subject.singlekeyword	Synthetic Data	*
dc.subject.singlekeyword	Linguistic Complexity	*
dc.subject.singlekeyword	Sentence Readability	*
dc.title	Generating and Evaluating Multi-Level Text Simplification: A Case Study on Italian	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.invited	contributo	en
dc.type.miur	273	-
iris.orcid.lastModifiedDate	2026/03/03 19:39:30	*
iris.orcid.lastModifiedMillisecond	1772563170213	*
iris.sitodocente.maxattempts	1	-

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570801

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni

ND

ND

ND

social impact