In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification.
PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification
Dominique Brunato;Andrea Cimino;Felice Dell'Orletta;Giulia Venturi
2016
Abstract
In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Dominique Brunato | it |
| dc.authority.people | Andrea Cimino | it |
| dc.authority.people | Felice Dell'Orletta | it |
| dc.authority.people | Giulia Venturi | it |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/20 07:45:36 | - |
| dc.date.available | 2024/02/20 07:45:36 | - |
| dc.date.issued | 2016 | - |
| dc.description.abstracteng | In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification. | - |
| dc.description.affiliations | Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR) | - |
| dc.description.allpeople | Brunato, Dominique; Cimino, Andrea; Dell'Orletta, Felice; Venturi, Giulia | - |
| dc.description.allpeopleoriginal | Dominique Brunato, Andrea Cimino, Felice Dell'Orletta, Giulia Venturi | - |
| dc.description.fulltext | open | en |
| dc.description.numberofauthors | 4 | - |
| dc.identifier.doi | 10.18653/v1/d16-1034 | en |
| dc.identifier.isbn | 978-1-945626-25-8 | en |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/333951 | - |
| dc.identifier.url | https://www.aclweb.org/anthology/D/D16/D16-1034.pdf | en |
| dc.language.iso | eng | en |
| dc.miur.last.status.update | 2024-07-22T14:34:44Z | * |
| dc.publisher.country | USA | en |
| dc.publisher.name | Association for Computational Linguistics | en |
| dc.publisher.place | Stroudsburg | en |
| dc.relation.conferencedate | 01-05/11/2016 | en |
| dc.relation.conferencename | Conference on Empirical Methods in Natural Language Processing (EMNLP 2016) | en |
| dc.relation.conferenceplace | Austin, Texas | en |
| dc.relation.firstpage | 351 | en |
| dc.relation.lastpage | 361 | en |
| dc.relation.numberofpages | 11 | en |
| dc.subject.keywords | Automatic Text Simplification | - |
| dc.subject.keywords | Sentence alignment | - |
| dc.subject.keywords | Italian corpus | - |
| dc.subject.singlekeyword | Automatic Text Simplification | * |
| dc.subject.singlekeyword | Sentence alignment | * |
| dc.subject.singlekeyword | Italian corpus | * |
| dc.title | PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| dc.type.referee | Sì, ma tipo non specificato | en |
| dc.ugov.descaux1 | 366726 | - |
| iris.mediafilter.data | 2025/04/18 02:55:38 | * |
| iris.orcid.lastModifiedDate | 2024/09/17 11:38:45 | * |
| iris.orcid.lastModifiedMillisecond | 1726565925697 | * |
| iris.scopus.extIssued | 2016 | - |
| iris.scopus.extTitle | PACCSS-It: A parallel corpus of complex-simple sentences for automatic text simplification | - |
| iris.sitodocente.maxattempts | 12 | - |
| iris.unpaywall.bestoahost | publisher | * |
| iris.unpaywall.bestoaversion | publishedVersion | * |
| iris.unpaywall.doi | 10.18653/v1/d16-1034 | * |
| iris.unpaywall.hosttype | publisher | * |
| iris.unpaywall.isoa | true | * |
| iris.unpaywall.landingpage | https://doi.org/10.18653/v1/d16-1034 | * |
| iris.unpaywall.license | cc-by | * |
| iris.unpaywall.metadataCallLastModified | 20/06/2025 04:57:08 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1750388228378 | - |
| iris.unpaywall.oastatus | hybrid | * |
| iris.unpaywall.pdfurl | https://www.aclweb.org/anthology/D16-1034.pdf | * |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
File in questo prodotto:
| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_366726-doc_171509.pdf
accesso aperto
Descrizione: PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
336.89 kB
Formato
Adobe PDF
|
336.89 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


