CNR Institutional Research Information System

We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller.

TEXT-CAKE: Challenging Language Models on Local Text Coherence

Dini L.;Brunato D.;Dell'Orletta F.;Caselli T.

2025

Abstract

We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.anceserie	INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS	en
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Dini L.	en
dc.authority.people	Brunato D.	en
dc.authority.people	Dell'Orletta F.	en
dc.authority.people	Caselli T.	en
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.date.accessioned	2026/03/03 14:30:05	-
dc.date.available	2026/03/03 14:30:05	-
dc.date.firstsubmission	2026/03/02 18:57:41	*
dc.date.issued	2025	-
dc.date.submission	2026/03/02 18:57:41	*
dc.description.abstracteng	We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller.	-
dc.description.allpeople	Dini, L.; Brunato, D.; Dell'Orletta, F.; Caselli, T.	-
dc.description.allpeopleoriginal	Dini L.; Brunato D.; Dell'Orletta F.; Caselli T.	en
dc.description.fulltext	open	en
dc.description.numberofauthors	4	-
dc.identifier.scopus	2-s2.0-85218500743	-
dc.identifier.source	scopus	*
dc.identifier.uri	https://hdl.handle.net/20.500.14243/570521	-
dc.language.iso	eng	en
dc.publisher.name	Association for Computational Linguistics (ACL)	en
dc.relation.conferencedate	2025	en
dc.relation.conferencename	31st International Conference on Computational Linguistics, COLING 2025	en
dc.relation.firstpage	4384	en
dc.relation.ispartofbook	Proceedings - International Conference on Computational Linguistics, COLING	en
dc.relation.lastpage	4398	en
dc.relation.numberofpages	15	en
dc.subject.keywordseng	Large Language Models (LLMs)	-
dc.subject.keywordseng	Text Coherence	-
dc.subject.singlekeyword	Large Language Models (LLMs)	*
dc.subject.singlekeyword	Text Coherence	*
dc.title	TEXT-CAKE: Challenging Language Models on Local Text Coherence	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
iris.mediafilter.data	2026/03/04 02:52:26	*
iris.orcid.lastModifiedDate	2026/03/04 02:09:47	*
iris.orcid.lastModifiedMillisecond	1772586587119	*
iris.scopus.extIssued	2025	-
iris.scopus.extTitle	TEXT-CAKE: Challenging Language Models on Local Text Coherence	-
iris.sitodocente.maxattempts	1	-
scopus.authority.anceserie	INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS###2951-2093	*
scopus.category	2614	*
scopus.category	1706	*
scopus.category	1703	*
scopus.contributor.affiliation	University of Pisa	-
scopus.contributor.affiliation	ItaliaNLP Lab	-
scopus.contributor.affiliation	ItaliaNLP Lab	-
scopus.contributor.affiliation	University of Groningen	-
scopus.contributor.afid	60028868	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	60010023	-
scopus.contributor.auid	35185041000	-
scopus.contributor.auid	55237740200	-
scopus.contributor.auid	57540567000	-
scopus.contributor.auid	35932126700	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Netherlands	-
scopus.contributor.dptid		-
scopus.contributor.dptid	114087935	-
scopus.contributor.dptid	114087935	-
scopus.contributor.dptid		-
scopus.contributor.name	Luca	-
scopus.contributor.name	Dominique	-
scopus.contributor.name	Felice	-
scopus.contributor.name	Tommaso	-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation	Istituto di Linguistica Computazionale “Antonio Zampolli” (CNR-ILC);	-
scopus.contributor.subaffiliation	Istituto di Linguistica Computazionale “Antonio Zampolli” (CNR-ILC);	-
scopus.contributor.subaffiliation	Center for Language and Cognition (CLCG);	-
scopus.contributor.surname	Dini	-
scopus.contributor.surname	Brunato	-
scopus.contributor.surname	Dell'Orletta	-
scopus.contributor.surname	Caselli	-
scopus.date.issued	2025	*
scopus.description.abstracteng	We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller.	*
scopus.description.allpeopleoriginal	Dini L.; Brunato D.; Dell'Orletta F.; Caselli T.	*
scopus.differences	scopus.identifier.isbn	*
scopus.differences	scopus.relation.conferenceplace	*
scopus.document.type	cp	*
scopus.document.types	cp	*
scopus.identifier.isbn	9798891761964	*
scopus.identifier.pui	646571713	*
scopus.identifier.scopus	2-s2.0-85218500743	*
scopus.journal.sourceid	21101167500	*
scopus.language.iso	eng	*
scopus.publisher.name	Association for Computational Linguistics (ACL)	*
scopus.relation.conferencedate	2025	*
scopus.relation.conferencename	31st International Conference on Computational Linguistics, COLING 2025	*
scopus.relation.conferenceplace	are	*
scopus.relation.firstpage	4384	*
scopus.relation.lastpage	4398	*
scopus.title	TEXT-CAKE: Challenging Language Models on Local Text Coherence	*
scopus.titleeng	TEXT-CAKE: Challenging Language Models on Local Text Coherence	*
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2025.coling-main.296.pdf accesso aperto Licenza: Creative commons Dimensione 1.29 MB Formato Adobe PDF Visualizza/Apri	1.29 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/570521

Citazioni

ND

1

ND

social impact