Recent advances in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, yet controlling model outputs remains a challenge. In this study, we explore the use of LLMs to generate high-quality synthetic data for Automatic Text Simplification (ATS), evaluating the ability of models fine-tuned on Italian to produce multiple simplified versions of the same original sentence that vary in readability and in their lexical and (morpho-)syntactic characteristics. The approach is tested across two domains, Wikipedia and Public Administration, allowing us to explore domain sensitivity. Additionally, we compare the linguistic phenomena observed in the generated data with those found in ATS resources previously created through manual or semi-automatic methods. Our results suggest that the best-performing LLM can generate linguistically diverse simplifications that align with known simplification patterns, offering a promising direction for building reliable ATS resources, including simplifications suited to varying levels of reader proficiency.

Generating and Evaluating Multi-Level Text Simplification: A Case Study on Italian

Michele Papucci;Giulia Venturi;Felice Dell'Orletta
2025

Abstract

Recent advances in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, yet controlling model outputs remains a challenge. In this study, we explore the use of LLMs to generate high-quality synthetic data for Automatic Text Simplification (ATS), evaluating the ability of models fine-tuned on Italian to produce multiple simplified versions of the same original sentence that vary in readability and in their lexical and (morpho-)syntactic characteristics. The approach is tested across two domains, Wikipedia and Public Administration, allowing us to explore domain sensitivity. Additionally, we compare the linguistic phenomena observed in the generated data with those found in ATS resources previously created through manual or semi-automatic methods. Our results suggest that the best-performing LLM can generate linguistically diverse simplifications that align with known simplification patterns, offering a promising direction for building reliable ATS resources, including simplifications suited to varying levels of reader proficiency.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Michele Papucci en
dc.authority.people Giulia Venturi en
dc.authority.people Felice Dell'Orletta en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.firstsubmission 2026/03/03 19:39:30 *
dc.date.issued 2025 -
dc.date.submission 2026/03/03 19:39:30 *
dc.description.abstracteng Recent advances in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, yet controlling model outputs remains a challenge. In this study, we explore the use of LLMs to generate high-quality synthetic data for Automatic Text Simplification (ATS), evaluating the ability of models fine-tuned on Italian to produce multiple simplified versions of the same original sentence that vary in readability and in their lexical and (morpho-)syntactic characteristics. The approach is tested across two domains, Wikipedia and Public Administration, allowing us to explore domain sensitivity. Additionally, we compare the linguistic phenomena observed in the generated data with those found in ATS resources previously created through manual or semi-automatic methods. Our results suggest that the best-performing LLM can generate linguistically diverse simplifications that align with known simplification patterns, offering a promising direction for building reliable ATS resources, including simplifications suited to varying levels of reader proficiency. -
dc.description.allpeople Papucci, Michele; Venturi, Giulia; Dell'Orletta, Felice -
dc.description.allpeopleoriginal Michele Papucci, Giulia Venturi, Felice Dell'Orletta en
dc.description.fulltext none en
dc.description.numberofauthors 3 -
dc.identifier.isbn 979-12-243-0587-3 en
dc.identifier.source manual *
dc.identifier.uri https://hdl.handle.net/20.500.14243/570801 -
dc.identifier.url https://aclanthology.org/2025.clicit-1.82/ en
dc.language.iso eng en
dc.miur.last.status.update 2026-03-03T18:40:06Z *
dc.publisher.name CEUR Workshop Proceedings en
dc.relation.allauthors Michele Papucci, Giulia Venturi, Felice Dell’Orletta en
dc.relation.conferencename Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) en
dc.relation.firstpage 870 en
dc.relation.ispartofbook Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) en
dc.relation.lastpage 885 en
dc.subject.keywordseng Automatic Text Simplification, Large Language Models, Synthetic Data, Linguistic Complexity, Sentence Readability -
dc.subject.singlekeyword Automatic Text Simplification *
dc.subject.singlekeyword Large Language Models *
dc.subject.singlekeyword Synthetic Data *
dc.subject.singlekeyword Linguistic Complexity *
dc.subject.singlekeyword Sentence Readability *
dc.title Generating and Evaluating Multi-Level Text Simplification: A Case Study on Italian en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.invited contributo en
dc.type.miur 273 -
iris.orcid.lastModifiedDate 2026/03/03 19:39:30 *
iris.orcid.lastModifiedMillisecond 1772563170213 *
iris.sitodocente.maxattempts 1 -
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570801
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact