In this paper, we tackle three underresearched issues of the automatic readability assessment literature, namely the evaluation of text readability in less resourced languages, with respect to sentences (as opposed to documents) as well as across textual genres. Different solutions to these issues have been tested by using and refining READ-IT, the first advanced readability assessment tool for Italian, which combines traditional raw text features with lexical, morpho-syntactic and syntactic information. In READ-IT readability assessment is carried out with respect to both documents and sentences, with the latter constituting an important novelty of the proposed approach: READ-IT shows a high accuracy in the document classification task and promising results in the sentence classification scenario. By comparing the results of two versions of READ-IT, adopting a classification- versus ranking-based approach, we also show that readability assessment is strongly influenced by textual genre; for this reason a genre-oriented notion of readability is needed. With classification-based approaches, reliable results can only be achieved with genre-specific models: Since this is far from being a workable solution, especially for less resourced languages, a new ranking method for readability assessment is proposed, based on the notion of distance.

Assessing document and sentence readability in less resourced languages and across textual genres

Felice Dell'Orletta;Simonetta Montemagni;Giulia Venturi
2014

Abstract

In this paper, we tackle three underresearched issues of the automatic readability assessment literature, namely the evaluation of text readability in less resourced languages, with respect to sentences (as opposed to documents) as well as across textual genres. Different solutions to these issues have been tested by using and refining READ-IT, the first advanced readability assessment tool for Italian, which combines traditional raw text features with lexical, morpho-syntactic and syntactic information. In READ-IT readability assessment is carried out with respect to both documents and sentences, with the latter constituting an important novelty of the proposed approach: READ-IT shows a high accuracy in the document classification task and promising results in the sentence classification scenario. By comparing the results of two versions of READ-IT, adopting a classification- versus ranking-based approach, we also show that readability assessment is strongly influenced by textual genre; for this reason a genre-oriented notion of readability is needed. With classification-based approaches, reliable results can only be achieved with genre-specific models: Since this is far from being a workable solution, especially for less resourced languages, a new ranking method for readability assessment is proposed, based on the notion of distance.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Felice Dell'Orletta it
dc.authority.people Simonetta Montemagni it
dc.authority.people Giulia Venturi it
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/18 04:35:38 -
dc.date.available 2024/02/18 04:35:38 -
dc.date.issued 2014 -
dc.description.abstracteng In this paper, we tackle three underresearched issues of the automatic readability assessment literature, namely the evaluation of text readability in less resourced languages, with respect to sentences (as opposed to documents) as well as across textual genres. Different solutions to these issues have been tested by using and refining READ-IT, the first advanced readability assessment tool for Italian, which combines traditional raw text features with lexical, morpho-syntactic and syntactic information. In READ-IT readability assessment is carried out with respect to both documents and sentences, with the latter constituting an important novelty of the proposed approach: READ-IT shows a high accuracy in the document classification task and promising results in the sentence classification scenario. By comparing the results of two versions of READ-IT, adopting a classification- versus ranking-based approach, we also show that readability assessment is strongly influenced by textual genre; for this reason a genre-oriented notion of readability is needed. With classification-based approaches, reliable results can only be achieved with genre-specific models: Since this is far from being a workable solution, especially for less resourced languages, a new ranking method for readability assessment is proposed, based on the notion of distance. -
dc.description.affiliations ILC - Istituto di linguistica computazionale "Antonio Zampolli" -
dc.description.allpeople Felice Dell'Orletta; Simonetta Montemagni; Giulia Venturi -
dc.description.allpeopleoriginal Felice Dell'Orletta, Simonetta Montemagni, Giulia Venturi -
dc.description.fulltext none en
dc.description.numberofauthors 3 -
dc.identifier.doi 10.1075/itl.165.2.03del -
dc.identifier.uri https://hdl.handle.net/20.500.14243/260898 -
dc.identifier.url http://www.ingentaconnect.com/content/jbp/itl/2014/00000165/00000002/art00005 -
dc.language.iso eng -
dc.relation.firstpage 163 -
dc.relation.issue 2 -
dc.relation.lastpage 193 -
dc.relation.volume 165 -
dc.subject.keywords readability assessment -
dc.subject.keywords less resourced languages -
dc.subject.keywords multi-level linguistic annotation -
dc.subject.keywords textual genres -
dc.subject.singlekeyword readability assessment *
dc.subject.singlekeyword less resourced languages *
dc.subject.singlekeyword multi-level linguistic annotation *
dc.subject.singlekeyword textual genres *
dc.title Assessing document and sentence readability in less resourced languages and across textual genres en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.miur 262 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 285640 -
iris.orcid.lastModifiedDate 2024/02/22 23:56:05 *
iris.orcid.lastModifiedMillisecond 1708642565842 *
iris.scopus.extIssued 2014 -
iris.scopus.extTitle Assessing document and sentence readability in less resourced languages and across textual genres -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.doi 10.1075/itl.165.2.03del *
iris.unpaywall.isoa false *
iris.unpaywall.journalisindoaj false *
iris.unpaywall.metadataCallLastModified 13/12/2025 03:44:34 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1765593874802 -
iris.unpaywall.oastatus closed *
Appare nelle tipologie: 01.01 Articolo in rivista
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/260898
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact