The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with a specific view to text simplification. In particular, it addresses two open issues connected with it, i.e. the corpora to be used for training, and the identification of the most effective features to determine sentence readability. An existing readability assessment tool developed for Italian was specialized at the level of training corpus and learning algorithm. A maximum entropy-based feature selection and ranking algorithm (grafting) was used to identify to the most relevant features: it turned out that assessing the readability of sentences is a complex task, requiring a high number of features, mainly syntactic ones.
Assessing the readability of sentences: which corpora and features?
Dell'Orletta F;Cimino A;Venturi G;Montemagni S
2014
Abstract
The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with a specific view to text simplification. In particular, it addresses two open issues connected with it, i.e. the corpora to be used for training, and the identification of the most effective features to determine sentence readability. An existing readability assessment tool developed for Italian was specialized at the level of training corpus and learning algorithm. A maximum entropy-based feature selection and ranking algorithm (grafting) was used to identify to the most relevant features: it turned out that assessing the readability of sentences is a complex task, requiring a high number of features, mainly syntactic ones.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | - |
| dc.authority.people | Dell'Orletta F | it |
| dc.authority.people | Wieling M | it |
| dc.authority.people | Cimino A | it |
| dc.authority.people | Venturi G | it |
| dc.authority.people | Montemagni S | it |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/18 15:25:56 | - |
| dc.date.available | 2024/02/18 15:25:56 | - |
| dc.date.issued | 2014 | - |
| dc.description.abstracteng | The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with a specific view to text simplification. In particular, it addresses two open issues connected with it, i.e. the corpora to be used for training, and the identification of the most effective features to determine sentence readability. An existing readability assessment tool developed for Italian was specialized at the level of training corpus and learning algorithm. A maximum entropy-based feature selection and ranking algorithm (grafting) was used to identify to the most relevant features: it turned out that assessing the readability of sentences is a complex task, requiring a high number of features, mainly syntactic ones. | - |
| dc.description.affiliations | Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR); Department of Humanities Computing, University of Groningen, The Netherlands; Department of Quantitative Linguistics, University of Tubingen, Germany | - |
| dc.description.allpeople | Dell'Orletta F.; Wieling M.; Cimino A.; Venturi G.; Montemagni S. | - |
| dc.description.allpeopleoriginal | Dell'Orletta F., Wieling M., Cimino A., Venturi G., Montemagni S. | - |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 4 | - |
| dc.identifier.isbn | 978-1-941643-03-7 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/266274 | - |
| dc.identifier.url | http://acl2014.org/acl2014/W14-18/pdf/W14-1820.pdf | - |
| dc.language.iso | eng | - |
| dc.publisher.country | USA | - |
| dc.publisher.name | Association for Computational Linguistics | - |
| dc.publisher.place | Stroudsburg | - |
| dc.relation.conferencedate | 26 giugno 2014 | - |
| dc.relation.conferencename | 9th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2014) | - |
| dc.relation.conferenceplace | Baltimore, Maryland, USA | - |
| dc.relation.firstpage | 163 | - |
| dc.relation.ispartofbook | Proceedings of 9th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2014) | - |
| dc.relation.lastpage | 173 | - |
| dc.title | Assessing the readability of sentences: which corpora and features? | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| dc.type.referee | Sì, ma tipo non specificato | - |
| dc.ugov.descaux1 | 294084 | - |
| iris.orcid.lastModifiedDate | 2024/02/22 09:27:11 | * |
| iris.orcid.lastModifiedMillisecond | 1708590431635 | * |
| iris.scopus.extIssued | 2014 | - |
| iris.scopus.extTitle | Assessing the Readability of Sentences: Which Corpora and Features? | - |
| iris.sitodocente.maxattempts | 1 | - |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


