The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with a specific view to text simplification. In particular, it addresses two open issues connected with it, i.e. the corpora to be used for training, and the identification of the most effective features to determine sentence readability. An existing readability assessment tool developed for Italian was specialized at the level of training corpus and learning algorithm. A maximum entropy-based feature selection and ranking algorithm (grafting) was used to identify to the most relevant features: it turned out that assessing the readability of sentences is a complex task, requiring a high number of features, mainly syntactic ones.

Assessing the readability of sentences: which corpora and features?

Dell'Orletta F;Cimino A;Venturi G;Montemagni S
2014

Abstract

The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with a specific view to text simplification. In particular, it addresses two open issues connected with it, i.e. the corpora to be used for training, and the identification of the most effective features to determine sentence readability. An existing readability assessment tool developed for Italian was specialized at the level of training corpus and learning algorithm. A maximum entropy-based feature selection and ranking algorithm (grafting) was used to identify to the most relevant features: it turned out that assessing the readability of sentences is a complex task, requiring a high number of features, mainly syntactic ones.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Dell'Orletta F it
dc.authority.people Wieling M it
dc.authority.people Cimino A it
dc.authority.people Venturi G it
dc.authority.people Montemagni S it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/18 15:25:56 -
dc.date.available 2024/02/18 15:25:56 -
dc.date.issued 2014 -
dc.description.abstracteng The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with a specific view to text simplification. In particular, it addresses two open issues connected with it, i.e. the corpora to be used for training, and the identification of the most effective features to determine sentence readability. An existing readability assessment tool developed for Italian was specialized at the level of training corpus and learning algorithm. A maximum entropy-based feature selection and ranking algorithm (grafting) was used to identify to the most relevant features: it turned out that assessing the readability of sentences is a complex task, requiring a high number of features, mainly syntactic ones. -
dc.description.affiliations Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR); Department of Humanities Computing, University of Groningen, The Netherlands; Department of Quantitative Linguistics, University of Tubingen, Germany -
dc.description.allpeople Dell'Orletta F.; Wieling M.; Cimino A.; Venturi G.; Montemagni S. -
dc.description.allpeopleoriginal Dell'Orletta F., Wieling M., Cimino A., Venturi G., Montemagni S. -
dc.description.fulltext none en
dc.description.numberofauthors 4 -
dc.identifier.isbn 978-1-941643-03-7 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/266274 -
dc.identifier.url http://acl2014.org/acl2014/W14-18/pdf/W14-1820.pdf -
dc.language.iso eng -
dc.publisher.country USA -
dc.publisher.name Association for Computational Linguistics -
dc.publisher.place Stroudsburg -
dc.relation.conferencedate 26 giugno 2014 -
dc.relation.conferencename 9th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2014) -
dc.relation.conferenceplace Baltimore, Maryland, USA -
dc.relation.firstpage 163 -
dc.relation.ispartofbook Proceedings of 9th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2014) -
dc.relation.lastpage 173 -
dc.title Assessing the readability of sentences: which corpora and features? en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 294084 -
iris.orcid.lastModifiedDate 2024/02/22 09:27:11 *
iris.orcid.lastModifiedMillisecond 1708590431635 *
iris.scopus.extIssued 2014 -
iris.scopus.extTitle Assessing the Readability of Sentences: Which Corpora and Features? -
iris.sitodocente.maxattempts 1 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/266274
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact