This paper presents a study on readability-controlled Sentence Simplification for Italian, addressing the scarcity of annotated resources for low-resource languages. We introduce IMPaCTS (Italian Multilevel Parallel Corpus for Text Simplification), the first fully automatically created corpus of 1,444,160 original–simple sentence pairs automatically annotated with readability levels and linguistic features. It was generated using an Italian LLM prompted in zero-shot to produce multiple simplifications per input sentence. Increasing portions of the resource are used to fine-tune mono- and multilingual open-weight LLMs, conditioning them to generate simplifications at a target readability level. Results from automatic and human evaluations show that fine-tuning on IMPaCTS improves performance both in terms of task completion and adherence to the targeted readability levels compared to few-shot baselines.

Controllable Sentence Simplification in Italian: Fine-Tuning Large Language Models on Automatically Generated Resources

Michele Papucci;Giulia Venturi;Felice Dell'Orletta
2026

Abstract

This paper presents a study on readability-controlled Sentence Simplification for Italian, addressing the scarcity of annotated resources for low-resource languages. We introduce IMPaCTS (Italian Multilevel Parallel Corpus for Text Simplification), the first fully automatically created corpus of 1,444,160 original–simple sentence pairs automatically annotated with readability levels and linguistic features. It was generated using an Italian LLM prompted in zero-shot to produce multiple simplifications per input sentence. Increasing portions of the resource are used to fine-tune mono- and multilingual open-weight LLMs, conditioning them to generate simplifications at a target readability level. Results from automatic and human evaluations show that fine-tuning on IMPaCTS improves performance both in terms of task completion and adherence to the targeted readability levels compared to few-shot baselines.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Michele Papucci en
dc.authority.people Giulia Venturi en
dc.authority.people Felice Dell'Orletta en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.firstsubmission 2026/05/11 18:20:38 *
dc.date.issued 2026 -
dc.date.submission 2026/05/11 18:27:44 *
dc.description.abstracteng This paper presents a study on readability-controlled Sentence Simplification for Italian, addressing the scarcity of annotated resources for low-resource languages. We introduce IMPaCTS (Italian Multilevel Parallel Corpus for Text Simplification), the first fully automatically created corpus of 1,444,160 original–simple sentence pairs automatically annotated with readability levels and linguistic features. It was generated using an Italian LLM prompted in zero-shot to produce multiple simplifications per input sentence. Increasing portions of the resource are used to fine-tune mono- and multilingual open-weight LLMs, conditioning them to generate simplifications at a target readability level. Results from automatic and human evaluations show that fine-tuning on IMPaCTS improves performance both in terms of task completion and adherence to the targeted readability levels compared to few-shot baselines. -
dc.description.allpeople Papucci, Michele; Venturi, Giulia; Dell'Orletta, Felice -
dc.description.allpeopleoriginal Michele Papucci, Giulia Venturi, Felice Dell'Orletta en
dc.description.fulltext none en
dc.description.numberofauthors 3 -
dc.identifier.doi 10.63317/5fgm358dfxt5 en
dc.identifier.isbn 978-2-493814-49-4 en
dc.identifier.source manual *
dc.identifier.uri https://hdl.handle.net/20.500.14243/580421 -
dc.identifier.url http://www.lrec-conf.org/proceedings/lrec2026/pdf/2026.lrec2026-1.570 en
dc.language.iso eng en
dc.relation.allauthors Michele Papucci, Giulia Venturi, Felice Dell’Orletta en
dc.relation.conferencename 15th Language Resources and Evaluation Conference (LREC 2026) en
dc.relation.conferenceplace Palma de Maiorca en
dc.relation.firstpage 7178 en
dc.relation.ispartofbook Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026) en
dc.relation.lastpage 7191 en
dc.relation.numberofpages 14 en
dc.subject.keywordseng Controlled Sentence Simplification, Readability Assessment, Large Language Models -
dc.subject.singlekeyword Controlled Sentence Simplification *
dc.subject.singlekeyword Readability Assessment *
dc.subject.singlekeyword Large Language Models *
dc.title Controllable Sentence Simplification in Italian: Fine-Tuning Large Language Models on Automatically Generated Resources en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.impactfactor si en
dc.type.invited contributo en
dc.type.miur 273 -
iris.orcid.lastModifiedDate 2026/05/11 18:27:44 *
iris.orcid.lastModifiedMillisecond 1778516864585 *
iris.sitodocente.maxattempts 1 -
iris.unpaywall.doi 10.63317/5fgm358dfxt5 *
iris.unpaywall.isoa false *
iris.unpaywall.journalisindoaj false *
iris.unpaywall.metadataCallLastModified 13/05/2026 04:18:04 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1778638684989 -
iris.unpaywall.oastatus closed *
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/580421
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact