Addressing surprisal deficiencies in reading time models

Marten van Schijndel, William Schuler


Abstract
This study demonstrates a weakness in how n-gram and PCFG surprisal are used to predict reading times in eye-tracking data. In particular, the information conveyed by words skipped during saccades is not usually included in the surprisal measures. This study shows that correcting the surprisal calculation improves n-gram surprisal and that upcoming n-grams affect reading times, replicating previous findings of how lexical frequencies affect reading times. In contrast, the predictivity of PCFG surprisal does not benefit from the surprisal correction despite the fact that lexical sequences skipped by saccades are processed by readers, as demonstrated by the corrected n-gram measure. These results raise questions about the formulation of information-theoretic measures of syntactic processing such as PCFG surprisal and entropy reduction when applied to reading times.
Anthology ID:
W16-4104
Volume:
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Dominique Brunato, Felice Dell’Orletta, Giulia Venturi, Thomas François, Philippe Blache
Venue:
CL4LC
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
32–37
Language:
URL:
https://aclanthology.org/W16-4104
DOI:
Bibkey:
Cite (ACL):
Marten van Schijndel and William Schuler. 2016. Addressing surprisal deficiencies in reading time models. In Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), pages 32–37, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Addressing surprisal deficiencies in reading time models (van Schijndel & Schuler, CL4LC 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4104.pdf