Testing the Predictions of Surprisal Theory in 11 Languages

Ethan G. Wilcox; Tiago Pimentel; Clara Meister; Ryan Cotterell; Roger Levy

doi:10.1162/tacl_a_00612

Testing the Predictions of Surprisal Theory in 11 Languages

Ethan G. Wilcox, Tiago Pimentel, Clara Meister, Ryan Cotterell, Roger P. Levy

Abstract

Surprisal theory posits that less-predictable words should take more time to process, with word predictability quantified as surprisal, i.e., negative log probability in context. While evidence supporting the predictions of surprisal theory has been replicated widely, much of it has focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times, (ii) whether expected surprisal, i.e., contextual entropy, is predictive of reading times, and (iii) whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to date between information theory and incremental language processing across languages.

Anthology ID:: 2023.tacl-1.82
Volume:: Transactions of the Association for Computational Linguistics, Volume 11
Month:
Year:: 2023
Address:: Cambridge, MA
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 1451–1470
Language:
URL:: https://aclanthology.org/2023.tacl-1.82
DOI:: 10.1162/tacl_a_00612
Bibkey:
Cite (ACL):: Ethan G. Wilcox, Tiago Pimentel, Clara Meister, Ryan Cotterell, and Roger P. Levy. 2023. Testing the Predictions of Surprisal Theory in 11 Languages. Transactions of the Association for Computational Linguistics, 11:1451–1470.
Cite (Informal):: Testing the Predictions of Surprisal Theory in 11 Languages (Wilcox et al., TACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.tacl-1.82.pdf

PDF Cite Search