Eye tracking records of natural text reading are known to provide significant insights into the cognitive processes underlying word processing and text comprehension, with gaze patterns, such as fixation duration and saccadic movements, being modulated by morphological, lexical, and higher-level structural properties of the text being read. Although some of these effects have been simulated with computational models, it is still not clear how accurately computational modelling can predict complex fixation patterns in connected text reading. State-of-the-art neural architectures have shown promising results, with pre-trained transformer-based classifiers having recently been claimed to outperform other competitors, achieving beyond 95% accuracy. However, transformer-based models have neither been compared with alternative architectures nor adequately evaluated for their sensitivity to the linguistic factors affecting human reading. Here we address these issues by evaluating the performance of a pool of neural networks in classifying eye-fixation English data as a function of both lexical and contextual factors. We show that i) accuracy of transformer-based models has largely been overestimated, ii) other simpler models make comparable or even better predictions, iii) most models are sensitive to some of the major lexical factors accounting for at least 50% of human fixation variance, iv) most models fail to capture some significant context-sensitive interactions, such as those accounting for spillover effects in reading. The work shows the benefits of combining accuracy-based evaluation metrics with non-linear regression modelling of fixed and random effects on both real and simulated eye-tracking data.

Comparative Evaluation of Computational Models Predicting Eye Fixation Patterns During Reading: Insights from Transformers and Simpler Architectures

Alessandro Lento
Primo
;
Andrea Nadalini
Secondo
;
Nadia Khlif;Vito Pirrelli;Claudia Marzi;Marcello Ferro
Ultimo
2024

Abstract

Eye tracking records of natural text reading are known to provide significant insights into the cognitive processes underlying word processing and text comprehension, with gaze patterns, such as fixation duration and saccadic movements, being modulated by morphological, lexical, and higher-level structural properties of the text being read. Although some of these effects have been simulated with computational models, it is still not clear how accurately computational modelling can predict complex fixation patterns in connected text reading. State-of-the-art neural architectures have shown promising results, with pre-trained transformer-based classifiers having recently been claimed to outperform other competitors, achieving beyond 95% accuracy. However, transformer-based models have neither been compared with alternative architectures nor adequately evaluated for their sensitivity to the linguistic factors affecting human reading. Here we address these issues by evaluating the performance of a pool of neural networks in classifying eye-fixation English data as a function of both lexical and contextual factors. We show that i) accuracy of transformer-based models has largely been overestimated, ii) other simpler models make comparable or even better predictions, iii) most models are sensitive to some of the major lexical factors accounting for at least 50% of human fixation variance, iv) most models fail to capture some significant context-sensitive interactions, such as those accounting for spillover effects in reading. The work shows the benefits of combining accuracy-based evaluation metrics with non-linear regression modelling of fixed and random effects on both real and simulated eye-tracking data.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Alessandro Lento en
dc.authority.people Andrea Nadalini en
dc.authority.people Nadia Khlif en
dc.authority.people Vito Pirrelli en
dc.authority.people Claudia Marzi en
dc.authority.people Marcello Ferro en
dc.authority.project SAC.AD002.173 en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2024/12/18 12:39:36 -
dc.date.available 2024/12/18 12:39:36 -
dc.date.firstsubmission 2024/12/18 09:42:19 *
dc.date.issued 2024 -
dc.date.submission 2025/03/06 17:51:19 *
dc.description.abstracteng Eye tracking records of natural text reading are known to provide significant insights into the cognitive processes underlying word processing and text comprehension, with gaze patterns, such as fixation duration and saccadic movements, being modulated by morphological, lexical, and higher-level structural properties of the text being read. Although some of these effects have been simulated with computational models, it is still not clear how accurately computational modelling can predict complex fixation patterns in connected text reading. State-of-the-art neural architectures have shown promising results, with pre-trained transformer-based classifiers having recently been claimed to outperform other competitors, achieving beyond 95% accuracy. However, transformer-based models have neither been compared with alternative architectures nor adequately evaluated for their sensitivity to the linguistic factors affecting human reading. Here we address these issues by evaluating the performance of a pool of neural networks in classifying eye-fixation English data as a function of both lexical and contextual factors. We show that i) accuracy of transformer-based models has largely been overestimated, ii) other simpler models make comparable or even better predictions, iii) most models are sensitive to some of the major lexical factors accounting for at least 50% of human fixation variance, iv) most models fail to capture some significant context-sensitive interactions, such as those accounting for spillover effects in reading. The work shows the benefits of combining accuracy-based evaluation metrics with non-linear regression modelling of fixed and random effects on both real and simulated eye-tracking data. -
dc.description.allpeople Lento, Alessandro; Nadalini, Andrea; Khlif, Nadia; Pirrelli, Vito; Marzi, Claudia; Ferro, Marcello -
dc.description.allpeopleoriginal Alessandro Lento, Andrea Nadalini, Nadia Khlif, Vito Pirrelli, Claudia Marzi, Marcello Ferro en
dc.description.fulltext open en
dc.description.international si en
dc.description.note ISSN 1613-0073 en
dc.description.numberofauthors 6 -
dc.identifier.isbn 979-12-210-7060-6 en
dc.identifier.scopus 2-s2.0-85214372943 en
dc.identifier.source manual *
dc.identifier.uri https://hdl.handle.net/20.500.14243/519724 -
dc.identifier.url https://ceur-ws.org/Vol-3878/ en
dc.identifier.url https://aclanthology.org/2024.clicit-1.0.pdf en
dc.language.iso eng en
dc.publisher.country DEU en
dc.publisher.name CEUR en
dc.publisher.place Aachen en
dc.relation.allauthors Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli en
dc.relation.conferencedate December 4-6, 2024 en
dc.relation.conferencename Italian Conference on Computational Linguistics (CLiC-it) en
dc.relation.conferenceplace Pisa en
dc.relation.ispartofbook Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024) en
dc.relation.medium ELETTRONICO en
dc.relation.numberofpages 10 en
dc.relation.projectAcronym ReadGround en
dc.relation.projectAwardNumber B55F21006650001 en
dc.relation.projectAwardTitle ReadGround en
dc.relation.projectFunderName Consiglio Nazionale delle Ricerche en
dc.relation.projectFundingStream Progetti@CNR I AVVISO – 2020 en
dc.relation.volume Vol-3878 en
dc.subject.keywordseng eye-tracking, eye fixation time prediction, neural network, contextual word embeddings, lexical features -
dc.subject.singlekeyword eye-tracking *
dc.subject.singlekeyword eye fixation time prediction *
dc.subject.singlekeyword neural network *
dc.subject.singlekeyword contextual word embeddings *
dc.subject.singlekeyword lexical features *
dc.title Comparative Evaluation of Computational Models Predicting Eye Fixation Patterns During Reading: Insights from Transformers and Simpler Architectures en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.impactfactor si en
dc.type.invited contributo en
dc.type.miur 273 -
dc.type.referee Esperti anonimi en
iris.mediafilter.data 2025/04/04 04:22:12 *
iris.orcid.lastModifiedDate 2025/03/06 17:53:07 *
iris.orcid.lastModifiedMillisecond 1741279987679 *
iris.scopus.extIssued 2024 -
iris.scopus.extTitle Comparative Evaluation of Computational Models Predicting Eye Fixation Patterns During Reading: Insights from Transformers and Simpler Architectures -
iris.sitodocente.maxattempts 1 -
scopus.authority.anceserie CEUR WORKSHOP PROCEEDINGS###1613-0073 *
scopus.category 1700 *
scopus.contributor.affiliation Università Campus Bio-Medico -
scopus.contributor.affiliation Istituto di Linguistica Computazionale "A. Zampolli" -
scopus.contributor.affiliation University Mohammed First -
scopus.contributor.affiliation Istituto di Linguistica Computazionale "A. Zampolli" -
scopus.contributor.affiliation Istituto di Linguistica Computazionale "A. Zampolli" -
scopus.contributor.affiliation Istituto di Linguistica Computazionale "A. Zampolli" -
scopus.contributor.afid 60005308 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60013094 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.auid 58941792900 -
scopus.contributor.auid 57192941272 -
scopus.contributor.auid 57731783300 -
scopus.contributor.auid 14833305800 -
scopus.contributor.auid 36621334800 -
scopus.contributor.auid 15759406100 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Morocco -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Alessandro -
scopus.contributor.name Andrea -
scopus.contributor.name Nadia -
scopus.contributor.name Vito -
scopus.contributor.name Claudia -
scopus.contributor.name Marcello -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Consiglio Nazionale delle Ricerche; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Consiglio Nazionale delle Ricerche; -
scopus.contributor.subaffiliation Consiglio Nazionale delle Ricerche; -
scopus.contributor.subaffiliation Consiglio Nazionale delle Ricerche; -
scopus.contributor.surname Lento -
scopus.contributor.surname Nadalini -
scopus.contributor.surname Khlif -
scopus.contributor.surname Pirrelli -
scopus.contributor.surname Marzi -
scopus.contributor.surname Ferro -
scopus.date.issued 2024 *
scopus.description.abstracteng Eye tracking records of natural text reading are known to provide significant insights into the cognitive processes underlying word processing and text comprehension, with gaze patterns, such as fixation duration and saccadic movements, being modulated by morphological, lexical, and higher-level structural properties of the text being read. Although some of these effects have been simulated with computational models, it is still not clear how accurately computational modelling can predict complex fixation patterns in connected text reading. State-of-the-art neural architectures have shown promising results, with pre-trained transformer-based classifiers having recently been claimed to outperform other competitors, achieving beyond 95% accuracy. However, transformer-based models have neither been compared with alternative architectures nor adequately evaluated for their sensitivity to the linguistic factors affecting human reading. Here we address these issues by evaluating the performance of a pool of neural networks in classifying eye-fixation English data as a function of both lexical and contextual factors. We show that i) accuracy of transformer-based models has largely been overestimated, ii) other simpler models make comparable or even better predictions, iii) most models are sensitive to some of the major lexical factors accounting for at least 50% of human fixation variance, iv) most models fail to capture some significant context-sensitive interactions, such as those accounting for spillover effects in reading. The work shows the benefits of combining accuracy-based evaluation metrics with non-linear regression modelling of fixed and random effects on both real and simulated eye-tracking data. *
scopus.description.allpeopleoriginal Lento A.; Nadalini A.; Khlif N.; Pirrelli V.; Marzi C.; Ferro M. *
scopus.differences scopus.relation.conferencename *
scopus.differences scopus.authority.anceserie *
scopus.differences scopus.publisher.name *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.relation.conferencedate *
scopus.differences scopus.description.allpeopleoriginal *
scopus.differences scopus.relation.conferenceplace *
scopus.differences scopus.relation.volume *
scopus.document.type cp *
scopus.document.types cp *
scopus.identifier.pui 646179193 *
scopus.identifier.scopus 2-s2.0-85214372943 *
scopus.journal.sourceid 21100218356 *
scopus.language.iso eng *
scopus.publisher.name CEUR-WS *
scopus.relation.conferencedate 2024 *
scopus.relation.conferencename 10th Italian Conference on Computational Linguistics, CLiC-it 2024 *
scopus.relation.conferenceplace ita *
scopus.relation.volume 3878 *
scopus.subject.keywords contextual word embeddings; eye fixation time prediction; eye-tracking; lexical features; neural network; *
scopus.title Comparative Evaluation of Computational Models Predicting Eye Fixation Patterns During Reading: Insights from Transformers and Simpler Architectures *
scopus.titleeng Comparative Evaluation of Computational Models Predicting Eye Fixation Patterns During Reading: Insights from Transformers and Simpler Architectures *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
CLiC_2024_light.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 4.4 MB
Formato Adobe PDF
4.4 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/519724
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact