This paper investigates the relationship between two complementary perspectives in the human assessment of sentence complexity and how they are modeled in a neural language model (NLM). The first perspective takes into account multiple online behavioral metrics obtained from eye-tracking recordings. The second one concerns the offline perception of complexity measured by explicit human judgments. Using a broad spectrum of linguistic features modeling lexical, morpho-syntactic, and syntactic properties of sentences, we perform a comprehensive analysis of linguistic phenomena associated with the two complexity viewpoints and report similarities and differences. We then show the effectiveness of linguistic features when explicitly leveraged by a regression model for predicting sentence complexity and compare its results with the ones obtained by a fine-tuned neural language model. We finally probe the NLM's linguistic competence before and after fine-tuning, highlighting how linguistic information encoded in representations changes when the model learns to predict complexity.

That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models

Brunato D;Dell'Orletta F
2021

Abstract

This paper investigates the relationship between two complementary perspectives in the human assessment of sentence complexity and how they are modeled in a neural language model (NLM). The first perspective takes into account multiple online behavioral metrics obtained from eye-tracking recordings. The second one concerns the offline perception of complexity measured by explicit human judgments. Using a broad spectrum of linguistic features modeling lexical, morpho-syntactic, and syntactic properties of sentences, we perform a comprehensive analysis of linguistic phenomena associated with the two complexity viewpoints and report similarities and differences. We then show the effectiveness of linguistic features when explicitly leveraged by a regression model for predicting sentence complexity and compare its results with the ones obtained by a fine-tuned neural language model. We finally probe the NLM's linguistic competence before and after fine-tuning, highlighting how linguistic information encoded in representations changes when the model learns to predict complexity.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Sarti G it
dc.authority.people Brunato D it
dc.authority.people Dell'Orletta F it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/20 13:12:51 -
dc.date.available 2024/02/20 13:12:51 -
dc.date.issued 2021 -
dc.description.abstracteng This paper investigates the relationship between two complementary perspectives in the human assessment of sentence complexity and how they are modeled in a neural language model (NLM). The first perspective takes into account multiple online behavioral metrics obtained from eye-tracking recordings. The second one concerns the offline perception of complexity measured by explicit human judgments. Using a broad spectrum of linguistic features modeling lexical, morpho-syntactic, and syntactic properties of sentences, we perform a comprehensive analysis of linguistic phenomena associated with the two complexity viewpoints and report similarities and differences. We then show the effectiveness of linguistic features when explicitly leveraged by a regression model for predicting sentence complexity and compare its results with the ones obtained by a fine-tuned neural language model. We finally probe the NLM's linguistic competence before and after fine-tuning, highlighting how linguistic information encoded in representations changes when the model learns to predict complexity. -
dc.description.affiliations University of Trieste, International School for Advanced Studies (SISSA), Trieste, Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR), Pisa -
dc.description.allpeople Sarti G; Brunato D; Dell'Orletta F -
dc.description.allpeopleoriginal Sarti G, Brunato D, Dell'Orletta F -
dc.description.fulltext none en
dc.description.numberofauthors 2 -
dc.identifier.doi 10.18653/v1/2021.cmcl-1.5 -
dc.identifier.isbn 978-1-954085-35-0 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/440173 -
dc.identifier.url https://aclanthology.org/2021.cmcl-1.5 -
dc.language.iso eng -
dc.relation.conferencedate 10/06/2021 -
dc.relation.conferencename Proceedings of Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2021) -
dc.relation.firstpage 48 -
dc.relation.lastpage 60 -
dc.subject.keywords linguistic complexity -
dc.subject.keywords eyetracking -
dc.subject.keywords human evaluation -
dc.subject.singlekeyword linguistic complexity *
dc.subject.singlekeyword eyetracking *
dc.subject.singlekeyword human evaluation *
dc.title That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 464972 -
iris.orcid.lastModifiedDate 2024/03/02 04:09:20 *
iris.orcid.lastModifiedMillisecond 1709348960784 *
iris.scopus.extIssued 2021 -
iris.scopus.extTitle That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.bestoaversion publishedVersion *
iris.unpaywall.doi 10.18653/v1/2021.cmcl-1.5 *
iris.unpaywall.isoa true *
iris.unpaywall.landingpage https://doi.org/10.18653/v1/2021.cmcl-1.5 *
iris.unpaywall.license cc-by *
iris.unpaywall.metadataCallLastModified 21/12/2025 05:36:41 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1766291801559 -
iris.unpaywall.oastatus gold *
iris.unpaywall.pdfurl https://aclanthology.org/2021.cmcl-1.5.pdf *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/440173
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact