Extracting structure from an LLM - how to improve on surprisal-based models of Human Language Processing

Daphne P. Wang; Mehrnoosh Sadrzadeh; Miloš Stanojević; Wing-Yee Chow; Richard Breheny

Extracting structure from an LLM - how to improve on surprisal-based models of Human Language Processing

Daphne P. Wang, Mehrnoosh Sadrzadeh, Miloš Stanojević, Wing-Yee Chow, Richard Breheny

Abstract

Prediction and reanalysis are considered two key processes that underly humans’ capacity to comprehend language in real time. Computational models capture it using Large Language Models (LLMs) and a statistical measure known as ‘surprisal’. Despite successes of LLMs, surprisal-based models face challenges when it comes to sentences requiring reanalysis due to pervasive temporary structural ambiguities, such as garden path sentences. We ask whether structural information can be extracted from LLM’s and develop a model that integrates it with their learnt statistics. When applied to a dataset of garden path sentences, the model achieved a significantly higher correlation with human reading times than surprisal. It also provided a better prediction of the garden path effect and could distinguish between sentence types with different levels of difficulty.

Anthology ID:: 2025.coling-main.329
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4938–4944
Language:
URL:: https://aclanthology.org/2025.coling-main.329/
DOI:
Bibkey:
Cite (ACL):: Daphne P. Wang, Mehrnoosh Sadrzadeh, Miloš Stanojević, Wing-Yee Chow, and Richard Breheny. 2025. Extracting structure from an LLM - how to improve on surprisal-based models of Human Language Processing. In Proceedings of the 31st International Conference on Computational Linguistics, pages 4938–4944, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Extracting structure from an LLM - how to improve on surprisal-based models of Human Language Processing (Wang et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.329.pdf

PDF Cite Search Fix data