Recent advances in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, yet controlling model outputs remains a challenge. In this study, we explore the use of LLMs to generate high-quality synthetic data for Automatic Text Simplification (ATS), evaluating the ability of models fine-tuned on Italian to produce multiple simplified versions of the same original sentence that vary in readability and in their lexical and (morpho-)syntactic characteristics. The approach is tested across two domains, Wikipedia and Public Administration, allowing us to explore domain sensitivity. Additionally, we compare the linguistic phenomena observed in the generated data with those found in ATS resources previously created through manual or semi-automatic methods. Our results suggest that the best-performing LLM can generate linguistically diverse simplifications that align with known simplification patterns, offering a promising direction for building reliable ATS resources, including simplifications suited to varying levels of reader proficiency.
Generating and Evaluating Multi-Level Text Simplification: A Case Study on Italian
Michele Papucci;Giulia Venturi;Felice Dell'Orletta
2025
Abstract
Recent advances in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, yet controlling model outputs remains a challenge. In this study, we explore the use of LLMs to generate high-quality synthetic data for Automatic Text Simplification (ATS), evaluating the ability of models fine-tuned on Italian to produce multiple simplified versions of the same original sentence that vary in readability and in their lexical and (morpho-)syntactic characteristics. The approach is tested across two domains, Wikipedia and Public Administration, allowing us to explore domain sensitivity. Additionally, we compare the linguistic phenomena observed in the generated data with those found in ATS resources previously created through manual or semi-automatic methods. Our results suggest that the best-performing LLM can generate linguistically diverse simplifications that align with known simplification patterns, offering a promising direction for building reliable ATS resources, including simplifications suited to varying levels of reader proficiency.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Michele Papucci | en |
| dc.authority.people | Giulia Venturi | en |
| dc.authority.people | Felice Dell'Orletta | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.firstsubmission | 2026/03/03 19:39:30 | * |
| dc.date.issued | 2025 | - |
| dc.date.submission | 2026/03/03 19:39:30 | * |
| dc.description.abstracteng | Recent advances in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, yet controlling model outputs remains a challenge. In this study, we explore the use of LLMs to generate high-quality synthetic data for Automatic Text Simplification (ATS), evaluating the ability of models fine-tuned on Italian to produce multiple simplified versions of the same original sentence that vary in readability and in their lexical and (morpho-)syntactic characteristics. The approach is tested across two domains, Wikipedia and Public Administration, allowing us to explore domain sensitivity. Additionally, we compare the linguistic phenomena observed in the generated data with those found in ATS resources previously created through manual or semi-automatic methods. Our results suggest that the best-performing LLM can generate linguistically diverse simplifications that align with known simplification patterns, offering a promising direction for building reliable ATS resources, including simplifications suited to varying levels of reader proficiency. | - |
| dc.description.allpeople | Papucci, Michele; Venturi, Giulia; Dell'Orletta, Felice | - |
| dc.description.allpeopleoriginal | Michele Papucci, Giulia Venturi, Felice Dell'Orletta | en |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 3 | - |
| dc.identifier.isbn | 979-12-243-0587-3 | en |
| dc.identifier.source | manual | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/570801 | - |
| dc.identifier.url | https://aclanthology.org/2025.clicit-1.82/ | en |
| dc.language.iso | eng | en |
| dc.miur.last.status.update | 2026-03-03T18:40:06Z | * |
| dc.publisher.name | CEUR Workshop Proceedings | en |
| dc.relation.allauthors | Michele Papucci, Giulia Venturi, Felice Dell’Orletta | en |
| dc.relation.conferencename | Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) | en |
| dc.relation.firstpage | 870 | en |
| dc.relation.ispartofbook | Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025) | en |
| dc.relation.lastpage | 885 | en |
| dc.subject.keywordseng | Automatic Text Simplification, Large Language Models, Synthetic Data, Linguistic Complexity, Sentence Readability | - |
| dc.subject.singlekeyword | Automatic Text Simplification | * |
| dc.subject.singlekeyword | Large Language Models | * |
| dc.subject.singlekeyword | Synthetic Data | * |
| dc.subject.singlekeyword | Linguistic Complexity | * |
| dc.subject.singlekeyword | Sentence Readability | * |
| dc.title | Generating and Evaluating Multi-Level Text Simplification: A Case Study on Italian | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.invited | contributo | en |
| dc.type.miur | 273 | - |
| iris.orcid.lastModifiedDate | 2026/03/03 19:39:30 | * |
| iris.orcid.lastModifiedMillisecond | 1772563170213 | * |
| iris.sitodocente.maxattempts | 1 | - |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


