In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification.
PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification
Dominique Brunato;Andrea Cimino;Felice Dell'Orletta;Giulia Venturi
2016
Abstract
In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification.File in questo prodotto:
| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_366726-doc_171509.pdf
accesso aperto
Descrizione: PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
336.89 kB
Formato
Adobe PDF
|
336.89 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


