CNR Institutional Research Information System

The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with a specific view to text simplification. In particular, it addresses two open issues connected with it, i.e. the corpora to be used for training, and the identification of the most effective features to determine sentence readability. An existing readability assessment tool developed for Italian was specialized at the level of training corpus and learning algorithm. A maximum entropy-based feature selection and ranking algorithm (grafting) was used to identify to the most relevant features: it turned out that assessing the readability of sentences is a complex task, requiring a high number of features, mainly syntactic ones.

Assessing the readability of sentences: which corpora and features?

Dell'Orletta F;Wieling M;Cimino A;Venturi G;Montemagni S

2014

Abstract

The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with a specific view to text simplification. In particular, it addresses two open issues connected with it, i.e. the corpora to be used for training, and the identification of the most effective features to determine sentence readability. An existing readability assessment tool developed for Italian was specialized at the level of training corpus and learning algorithm. A maximum entropy-based feature selection and ranking algorithm (grafting) was used to identify to the most relevant features: it turned out that assessing the readability of sentences is a complex task, requiring a high number of features, mainly syntactic ones.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.people	Dell'Orletta F	it
dc.authority.people	Wieling M	it
dc.authority.people	Cimino A	it
dc.authority.people	Venturi G	it
dc.authority.people	Montemagni S	it
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/18 15:25:56	-
dc.date.available	2024/02/18 15:25:56	-
dc.date.issued	2014	-
dc.description.abstracteng	The paper investigates the problem of sentence readability assessment, which is modelled as a classification task, with a specific view to text simplification. In particular, it addresses two open issues connected with it, i.e. the corpora to be used for training, and the identification of the most effective features to determine sentence readability. An existing readability assessment tool developed for Italian was specialized at the level of training corpus and learning algorithm. A maximum entropy-based feature selection and ranking algorithm (grafting) was used to identify to the most relevant features: it turned out that assessing the readability of sentences is a complex task, requiring a high number of features, mainly syntactic ones.	-
dc.description.affiliations	Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR); Department of Humanities Computing, University of Groningen, The Netherlands; Department of Quantitative Linguistics, University of Tubingen, Germany	-
dc.description.allpeople	Dell'Orletta F.; Wieling M.; Cimino A.; Venturi G.; Montemagni S.	-
dc.description.allpeopleoriginal	Dell'Orletta F., Wieling M., Cimino A., Venturi G., Montemagni S.	-
dc.description.fulltext	none	en
dc.description.numberofauthors	4	-
dc.identifier.isbn	978-1-941643-03-7	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/266274	-
dc.identifier.url	http://acl2014.org/acl2014/W14-18/pdf/W14-1820.pdf	-
dc.language.iso	eng	-
dc.publisher.country	USA	-
dc.publisher.name	Association for Computational Linguistics	-
dc.publisher.place	Stroudsburg	-
dc.relation.conferencedate	26 giugno 2014	-
dc.relation.conferencename	9th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2014)	-
dc.relation.conferenceplace	Baltimore, Maryland, USA	-
dc.relation.firstpage	163	-
dc.relation.ispartofbook	Proceedings of 9th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2014)	-
dc.relation.lastpage	173	-
dc.title	Assessing the readability of sentences: which corpora and features?	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	294084	-
iris.orcid.lastModifiedDate	2024/02/22 09:27:11	*
iris.orcid.lastModifiedMillisecond	1708590431635	*
iris.scopus.extIssued	2014	-
iris.scopus.extTitle	Assessing the Readability of Sentences: Which Corpora and Features?	-
iris.sitodocente.maxattempts	1	-
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/266274

Citazioni

ND

ND

ND

social impact