In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification.

PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification

Dominique Brunato;Andrea Cimino;Felice Dell'Orletta;Giulia Venturi
2016

Abstract

In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification.
2016
Istituto di linguistica computazionale "Antonio Zampolli" - ILC
978-1-945626-25-8
Automatic Text Simplification
Sentence alignment
Italian corpus
File in questo prodotto:
File Dimensione Formato  
prod_366726-doc_171509.pdf

accesso aperto

Descrizione: PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 336.89 kB
Formato Adobe PDF
336.89 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/333951
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact