David Robert Reich


2025

pdf bib
Automatic detection of dyslexia based on eye movements during reading in Russian
Anna Laurinavichyute | Anastasiya Lopukhina | David Robert Reich
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Dyslexia, a common learning disability, requires an early diagnosis. However, current screening tests are very time- and resource-consuming. We present an LSTM that aims to automatically classify dyslexia based on eye movements recorded during natural readingcombined with basic demographic information and linguistic features. The proposed model reaches an AUC of 0.93 and outperforms thestate-of-the-art model by 7 %. We report several ablation studies demonstrating that the fixation features matter the most for classification.

pdf bib
AlEYEgnment: Leveraging Eye‐Tracking‐While‐Reading to Align Language Models with Human Preferences
Anna Bondar | David Robert Reich | Lena Ann Jäger
Proceedings of the First International Workshop on Gaze Data and Natural Language Processing

Direct Preference Optimisation (DPO) has emerged as an effective approach for aligning large language models (LLMs) with human preferences. However, its reliance on binary feedback restricts its ability to capture nuanced human judgements. To address this limitation, we introduce a gaze-informed extension that incorporates implicit, fine-grained signals from eye-tracking-while-reading into the DPO framework. Eye movements, reflecting real-time human cognitive processing, provide fine-grained signals about the linguistic characteristics of the text that is being read. We leverage these signals and modify DPO by introducing a gaze-based additional loss term, that quantifies the differences between the model’s internal sentence representations and cognitive (i.e., gaze-based) representations derived from the readers’ gaze patterns. We explore the use of both human and synthetic gaze signals, employing a generative model of eye movements in reading to generate supplementary training data, ensuring the scalability of our approach. We apply the proposed approach to modelling linguistic acceptability. Experiments conducted on the CoLA dataset demonstrate performance gains in grammatical acceptability classification tasks when the models are trained in the gaze-augmented setting. These results demonstrate the utility of leveraging gaze data to align language models with human preferences. All code and data are available from Github.

2024

pdf bib
Reverse-Engineering the Reader
Samuel Kiegeland | Ethan Wilcox | Afra Amini | David Robert Reich | Ryan Cotterell
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Numerous previous studies have sought to determine to what extent language models, pretrained on natural language text, can serve as useful models of human cognition.In this paper, we are interested in the opposite question: whether we can directly optimize a language model to be a useful cognitive model by aligning it to human psychometric data.To achieve this, we introduce a novel alignment technique in which we fine-tune a language model to implicitly optimize the parameters of a linear regressor that directly predicts humans’ reading times of in-context linguistic units, e.g., phonemes, morphemes, or words, using surprisal estimates derived from the language model. Using words as a test case, we evaluate our technique across multiple model sizes and datasets and find that it improves language models’ psychometric predictive power.However, we find an inverse relationship between psychometric power and a model’s performance on downstream NLP tasks as well as its perplexity on held-out test data.While this latter trend has been observed before (Oh et al., 2022; Shain et al., 2024), we are the first to induce it by manipulating a model’s alignment to psychometric data.