This paper presents a study on readability-controlled Sentence Simplification for Italian, addressing the scarcity of annotated resources for low-resource languages. We introduce IMPaCTS (Italian Multilevel Parallel Corpus for Text Simplification), the first fully automatically created corpus of 1,444,160 original–simple sentence pairs automatically annotated with readability levels and linguistic features. It was generated using an Italian LLM prompted in zero-shot to produce multiple simplifications per input sentence. Increasing portions of the resource are used to fine-tune mono- and multilingual open-weight LLMs, conditioning them to generate simplifications at a target readability level. Results from automatic and human evaluations show that fine-tuning on IMPaCTS improves performance both in terms of task completion and adherence to the targeted readability levels compared to few-shot baselines.
Controllable Sentence Simplification in Italian: Fine-Tuning Large Language Models on Automatically Generated Resources
Michele Papucci;Giulia Venturi;Felice Dell'Orletta
2026
Abstract
This paper presents a study on readability-controlled Sentence Simplification for Italian, addressing the scarcity of annotated resources for low-resource languages. We introduce IMPaCTS (Italian Multilevel Parallel Corpus for Text Simplification), the first fully automatically created corpus of 1,444,160 original–simple sentence pairs automatically annotated with readability levels and linguistic features. It was generated using an Italian LLM prompted in zero-shot to produce multiple simplifications per input sentence. Increasing portions of the resource are used to fine-tune mono- and multilingual open-weight LLMs, conditioning them to generate simplifications at a target readability level. Results from automatic and human evaluations show that fine-tuning on IMPaCTS improves performance both in terms of task completion and adherence to the targeted readability levels compared to few-shot baselines.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Michele Papucci | en |
| dc.authority.people | Giulia Venturi | en |
| dc.authority.people | Felice Dell'Orletta | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.firstsubmission | 2026/05/11 18:20:38 | * |
| dc.date.issued | 2026 | - |
| dc.date.submission | 2026/05/11 18:27:44 | * |
| dc.description.abstracteng | This paper presents a study on readability-controlled Sentence Simplification for Italian, addressing the scarcity of annotated resources for low-resource languages. We introduce IMPaCTS (Italian Multilevel Parallel Corpus for Text Simplification), the first fully automatically created corpus of 1,444,160 original–simple sentence pairs automatically annotated with readability levels and linguistic features. It was generated using an Italian LLM prompted in zero-shot to produce multiple simplifications per input sentence. Increasing portions of the resource are used to fine-tune mono- and multilingual open-weight LLMs, conditioning them to generate simplifications at a target readability level. Results from automatic and human evaluations show that fine-tuning on IMPaCTS improves performance both in terms of task completion and adherence to the targeted readability levels compared to few-shot baselines. | - |
| dc.description.allpeople | Papucci, Michele; Venturi, Giulia; Dell'Orletta, Felice | - |
| dc.description.allpeopleoriginal | Michele Papucci, Giulia Venturi, Felice Dell'Orletta | en |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 3 | - |
| dc.identifier.doi | 10.63317/5fgm358dfxt5 | en |
| dc.identifier.isbn | 978-2-493814-49-4 | en |
| dc.identifier.source | manual | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/580421 | - |
| dc.identifier.url | http://www.lrec-conf.org/proceedings/lrec2026/pdf/2026.lrec2026-1.570 | en |
| dc.language.iso | eng | en |
| dc.relation.allauthors | Michele Papucci, Giulia Venturi, Felice Dell’Orletta | en |
| dc.relation.conferencename | 15th Language Resources and Evaluation Conference (LREC 2026) | en |
| dc.relation.conferenceplace | Palma de Maiorca | en |
| dc.relation.firstpage | 7178 | en |
| dc.relation.ispartofbook | Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026) | en |
| dc.relation.lastpage | 7191 | en |
| dc.relation.numberofpages | 14 | en |
| dc.subject.keywordseng | Controlled Sentence Simplification, Readability Assessment, Large Language Models | - |
| dc.subject.singlekeyword | Controlled Sentence Simplification | * |
| dc.subject.singlekeyword | Readability Assessment | * |
| dc.subject.singlekeyword | Large Language Models | * |
| dc.title | Controllable Sentence Simplification in Italian: Fine-Tuning Large Language Models on Automatically Generated Resources | en |
| dc.type.circulation | Internazionale | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.impactfactor | si | en |
| dc.type.invited | contributo | en |
| dc.type.miur | 273 | - |
| iris.orcid.lastModifiedDate | 2026/05/11 18:27:44 | * |
| iris.orcid.lastModifiedMillisecond | 1778516864585 | * |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.doi | 10.63317/5fgm358dfxt5 | * |
| iris.unpaywall.isoa | false | * |
| iris.unpaywall.journalisindoaj | false | * |
| iris.unpaywall.metadataCallLastModified | 13/05/2026 04:18:04 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1778638684989 | - |
| iris.unpaywall.oastatus | closed | * |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


