We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller.

TEXT-CAKE: Challenging Language Models on Local Text Coherence

Dini L.;Brunato D.;Dell'Orletta F.;
2025

Abstract

We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller.
Campo DC Valore Lingua
dc.authority.anceserie INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Dini L. en
dc.authority.people Brunato D. en
dc.authority.people Dell'Orletta F. en
dc.authority.people Caselli T. en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2026/03/03 14:30:05 -
dc.date.available 2026/03/03 14:30:05 -
dc.date.firstsubmission 2026/03/02 18:57:41 *
dc.date.issued 2025 -
dc.date.submission 2026/03/02 18:57:41 *
dc.description.abstracteng We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller. -
dc.description.allpeople Dini, L.; Brunato, D.; Dell'Orletta, F.; Caselli, T. -
dc.description.allpeopleoriginal Dini L.; Brunato D.; Dell'Orletta F.; Caselli T. en
dc.description.fulltext open en
dc.description.numberofauthors 4 -
dc.identifier.scopus 2-s2.0-85218500743 -
dc.identifier.source scopus *
dc.identifier.uri https://hdl.handle.net/20.500.14243/570521 -
dc.language.iso eng en
dc.publisher.name Association for Computational Linguistics (ACL) en
dc.relation.conferencedate 2025 en
dc.relation.conferencename 31st International Conference on Computational Linguistics, COLING 2025 en
dc.relation.firstpage 4384 en
dc.relation.ispartofbook Proceedings - International Conference on Computational Linguistics, COLING en
dc.relation.lastpage 4398 en
dc.relation.numberofpages 15 en
dc.subject.keywordseng Large Language Models (LLMs) -
dc.subject.keywordseng Text Coherence -
dc.subject.singlekeyword Large Language Models (LLMs) *
dc.subject.singlekeyword Text Coherence *
dc.title TEXT-CAKE: Challenging Language Models on Local Text Coherence en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
iris.mediafilter.data 2026/03/04 02:52:26 *
iris.orcid.lastModifiedDate 2026/03/04 02:09:47 *
iris.orcid.lastModifiedMillisecond 1772586587119 *
iris.scopus.extIssued 2025 -
iris.scopus.extTitle TEXT-CAKE: Challenging Language Models on Local Text Coherence -
iris.sitodocente.maxattempts 1 -
scopus.authority.anceserie INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS###2951-2093 *
scopus.category 2614 *
scopus.category 1706 *
scopus.category 1703 *
scopus.contributor.affiliation University of Pisa -
scopus.contributor.affiliation ItaliaNLP Lab -
scopus.contributor.affiliation ItaliaNLP Lab -
scopus.contributor.affiliation University of Groningen -
scopus.contributor.afid 60028868 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60010023 -
scopus.contributor.auid 35185041000 -
scopus.contributor.auid 55237740200 -
scopus.contributor.auid 57540567000 -
scopus.contributor.auid 35932126700 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Netherlands -
scopus.contributor.dptid -
scopus.contributor.dptid 114087935 -
scopus.contributor.dptid 114087935 -
scopus.contributor.dptid -
scopus.contributor.name Luca -
scopus.contributor.name Dominique -
scopus.contributor.name Felice -
scopus.contributor.name Tommaso -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale “Antonio Zampolli” (CNR-ILC); -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale “Antonio Zampolli” (CNR-ILC); -
scopus.contributor.subaffiliation Center for Language and Cognition (CLCG); -
scopus.contributor.surname Dini -
scopus.contributor.surname Brunato -
scopus.contributor.surname Dell'Orletta -
scopus.contributor.surname Caselli -
scopus.date.issued 2025 *
scopus.description.abstracteng We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller. *
scopus.description.allpeopleoriginal Dini L.; Brunato D.; Dell'Orletta F.; Caselli T. *
scopus.differences scopus.identifier.isbn *
scopus.differences scopus.relation.conferenceplace *
scopus.document.type cp *
scopus.document.types cp *
scopus.identifier.isbn 9798891761964 *
scopus.identifier.pui 646571713 *
scopus.identifier.scopus 2-s2.0-85218500743 *
scopus.journal.sourceid 21101167500 *
scopus.language.iso eng *
scopus.publisher.name Association for Computational Linguistics (ACL) *
scopus.relation.conferencedate 2025 *
scopus.relation.conferencename 31st International Conference on Computational Linguistics, COLING 2025 *
scopus.relation.conferenceplace are *
scopus.relation.firstpage 4384 *
scopus.relation.lastpage 4398 *
scopus.title TEXT-CAKE: Challenging Language Models on Local Text Coherence *
scopus.titleeng TEXT-CAKE: Challenging Language Models on Local Text Coherence *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
2025.coling-main.296.pdf

accesso aperto

Licenza: Creative commons
Dimensione 1.29 MB
Formato Adobe PDF
1.29 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570521
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact