Marina Björnsdóttir


2024

pdf bib
Reading Does Not Equal Reading: Comparing, Simulating and Exploiting Reading Behavior across Populations
David R. Reich | Shuwen Deng | Marina Björnsdóttir | Lena Jäger | Nora Hollenstein
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Eye-tracking-while-reading corpora play a crucial role in the study of human language processing, and, more recently, have been leveraged for cognitively enhancing neural language models. A critical limitation of existing corpora is that they often lack diversity, comprising primarily native speakers. In this study, we expand the eye-tracking-while-reading dataset CopCo, which initially included only Danish L1 readers with and without dyslexia, by incorporating a new dataset of L2 readers with diverse L1 backgrounds. Thus, the extended CopCo corpus constitutes the first eye-tracking-while-reading dataset encompassing neurotypical L1 and L1 readers with dyslexia as well as L2 readers, all reading the same materials. We first provide extensive descriptive statistics of the extended CopCo corpus. Second, we investigate how different degrees of diversity of the training data affect a state-of-the-art generative model of eye movements in reading. Finally, we use this scanpath generation model for gaze-augmented language modeling and investigate the impact of diversity in the training data on the model’s performance on a range of NLP downstream tasks. The code can be found here: https://github.com/norahollenstein/copco-processing.

2023

pdf bib
Dyslexia Prediction from Natural Reading of Danish Texts
Marina Björnsdóttir | Nora Hollenstein | Maria Barrett
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Dyslexia screening in adults is an open challenge since difficulties may not align with standardised tests designed for children. We collect eye-tracking data from natural reading of Danish texts from readers with dyslexia while closely following the experimental design of a corpus of readers without dyslexia. Research suggests that the opaque orthography of the Danish language affects the diagnostic characteristics of dyslexia. To the best of our knowledge, this is the first attempt to classify dyslexia from eye movements during reading in Danish. We experiment with various machine-learning methods, and our best model yields 0.85 F1 score.

2022

pdf bib
The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts
Nora Hollenstein | Maria Barrett | Marina Björnsdóttir
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Eye movement recordings from reading are one of the richest signals of human language processing. Corpora of eye movements during reading of contextualized running text is a way of making such records available for natural language processing purposes. Such corpora already exist in some languages. We present CopCo, the Copenhagen Corpus of eye tracking recordings from natural reading of Danish texts. It is the first eye tracking corpus of its kind for the Danish language. CopCo includes 1,832 sentences with 34,897 tokens of Danish text extracted from a collection of speech manuscripts. This first release of the corpus contains eye tracking data from 22 participants. It will be extended continuously with more participants and texts from other genres. We assess the data quality of the recorded eye movements and find that the extracted features are in line with related research. The dataset available here: https://osf.io/ud8s5/.