TEXT-CAKE: Challenging Language Models on Local Text Coherence

Luca Dini, Dominique Brunato, Felice Dell’Orletta, Tommaso Caselli


Abstract
We present a deep investigation of encoder-based Language Models (LMs) on their abilities to detect text coherence across four languages and four text genres using a new evaluation benchmark, TEXT-CAKE. We analyze both multilingual and monolingual LMs with varying architectures and parameters in different finetuning settings. Our findings demonstrate that identifying subtle perturbations that disrupt local coherence is still a challenging task. Furthermore, our results underline the importance of using diverse text genres during pre-training and of an optimal pre-traning objective and large vocabulary size. When controlling for other parameters, deep LMs (i.e., higher number of layers) have an advantage over shallow ones, even when the total number of parameters is smaller.
Anthology ID:
2025.coling-main.296
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4384–4398
Language:
URL:
https://aclanthology.org/2025.coling-main.296/
DOI:
Bibkey:
Cite (ACL):
Luca Dini, Dominique Brunato, Felice Dell’Orletta, and Tommaso Caselli. 2025. TEXT-CAKE: Challenging Language Models on Local Text Coherence. In Proceedings of the 31st International Conference on Computational Linguistics, pages 4384–4398, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
TEXT-CAKE: Challenging Language Models on Local Text Coherence (Dini et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.296.pdf