What does Surprisal have to do with Information Status?

Andrew Thomas Dyer


Abstract
It is common in cognitive computational linguistics to use language model surprisal as a measure of the information content of units in language production. From here, it is tempting to then apply this to information structure and status, considering surprising mentions to be new and unsurprising ones to be given, providing us with a ready-made continuous metric of information givenness/newness. To see if this conflation is appropriate, we perform regression experiments to see if language model surprisal is actually well predicted by information status as manually annotated, and if so, if this effect is separable from more trivial linguistic information such as parts of speech and word frequency. We find that information status alone is at best a very weak predictor of surprisal, and that surprisal can be much better predicted by the effect of parts of speech, which are highly correlated with both information status and surprisal; and word frequency. We conclude that surprisal should not be used as a continuous representation of information status by itself.
Anthology ID:
2026.sigtyp-main.4
Volume:
Proceedings of the 8th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Ekaterina Vylomova, Andrei Shcherbakov, Priya Rani
Venues:
SIGTYP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26–31
Language:
URL:
https://aclanthology.org/2026.sigtyp-main.4/
DOI:
Bibkey:
Cite (ACL):
Andrew Thomas Dyer. 2026. What does Surprisal have to do with Information Status?. In Proceedings of the 8th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 26–31, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
What does Surprisal have to do with Information Status? (Dyer, SIGTYP 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.sigtyp-main.4.pdf