In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification.

PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification

Dominique Brunato;Andrea Cimino;Felice Dell'Orletta;Giulia Venturi
2016

Abstract

In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Dominique Brunato it
dc.authority.people Andrea Cimino it
dc.authority.people Felice Dell'Orletta it
dc.authority.people Giulia Venturi it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/20 07:45:36 -
dc.date.available 2024/02/20 07:45:36 -
dc.date.issued 2016 -
dc.description.abstracteng In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification. -
dc.description.affiliations Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR) -
dc.description.allpeople Brunato, Dominique; Cimino, Andrea; Dell'Orletta, Felice; Venturi, Giulia -
dc.description.allpeopleoriginal Dominique Brunato, Andrea Cimino, Felice Dell'Orletta, Giulia Venturi -
dc.description.fulltext open en
dc.description.numberofauthors 4 -
dc.identifier.doi 10.18653/v1/d16-1034 en
dc.identifier.isbn 978-1-945626-25-8 en
dc.identifier.uri https://hdl.handle.net/20.500.14243/333951 -
dc.identifier.url https://www.aclweb.org/anthology/D/D16/D16-1034.pdf en
dc.language.iso eng en
dc.miur.last.status.update 2024-07-22T14:34:44Z *
dc.publisher.country USA en
dc.publisher.name Association for Computational Linguistics en
dc.publisher.place Stroudsburg en
dc.relation.conferencedate 01-05/11/2016 en
dc.relation.conferencename Conference on Empirical Methods in Natural Language Processing (EMNLP 2016) en
dc.relation.conferenceplace Austin, Texas en
dc.relation.firstpage 351 en
dc.relation.lastpage 361 en
dc.relation.numberofpages 11 en
dc.subject.keywords Automatic Text Simplification -
dc.subject.keywords Sentence alignment -
dc.subject.keywords Italian corpus -
dc.subject.singlekeyword Automatic Text Simplification *
dc.subject.singlekeyword Sentence alignment *
dc.subject.singlekeyword Italian corpus *
dc.title PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato en
dc.ugov.descaux1 366726 -
iris.mediafilter.data 2025/04/18 02:55:38 *
iris.orcid.lastModifiedDate 2024/09/17 11:38:45 *
iris.orcid.lastModifiedMillisecond 1726565925697 *
iris.scopus.extIssued 2016 -
iris.scopus.extTitle PACCSS-It: A parallel corpus of complex-simple sentences for automatic text simplification -
iris.sitodocente.maxattempts 12 -
iris.unpaywall.bestoahost publisher *
iris.unpaywall.bestoaversion publishedVersion *
iris.unpaywall.doi 10.18653/v1/d16-1034 *
iris.unpaywall.hosttype publisher *
iris.unpaywall.isoa true *
iris.unpaywall.landingpage https://doi.org/10.18653/v1/d16-1034 *
iris.unpaywall.license cc-by *
iris.unpaywall.metadataCallLastModified 20/06/2025 04:57:08 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1750388228378 -
iris.unpaywall.oastatus hybrid *
iris.unpaywall.pdfurl https://www.aclweb.org/anthology/D16-1034.pdf *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_366726-doc_171509.pdf

accesso aperto

Descrizione: PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 336.89 kB
Formato Adobe PDF
336.89 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/333951
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact