Fabian Kögel

2024

InteRead: An Eye Tracking Dataset of Interrupted Reading
Francesca Zermiani | Prajit Dhar | Ekta Sood | Fabian Kögel | Andreas Bulling | Maria Wirzberger
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Eye movements during reading offer a window into cognitive processes and language comprehension, but the scarcity of reading data with interruptions – which learners frequently encounter in their everyday learning environments – hampers advances in the development of intelligent learning technologies. We introduce InteRead – a novel 50-participant dataset of gaze data recorded during self-paced reading of real-world text. InteRead further offers fine-grained annotations of interruptions interspersed throughout the text as well as resumption lags incurred by these interruptions. Interruptions were triggered automatically once readers reached predefined target words. We validate our dataset by reporting interdisciplinary analyses on different measures of gaze behavior. In line with prior research, our analyses show that the interruptions as well as word length and word frequency effects significantly impact eye movements during reading. We also explore individual differences within our dataset, shedding light on the potential for tailored educational solutions. InteRead is accessible from our datasets web-page: https://www.ife.uni-stuttgart.de/en/llis/research/datasets/.

2021

pdf bib abs

VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood | Fabian Kögel | Florian Strohm | Prajit Dhar | Andreas Bulling
Proceedings of the 25th Conference on Computational Natural Language Learning

We present VQA-MHUG – a novel 49-participant dataset of multimodal human gaze on both images and questions during visual question answering (VQA) collected using a high-speed eye tracker. We use our dataset to analyze the similarity between human and neural attentive strategies learned by five state-of-the-art VQA models: Modular Co-Attention Network (MCAN) with either grid or region features, Pythia, Bilinear Attention Network (BAN), and the Multimodal Factorized Bilinear Pooling Network (MFB). While prior work has focused on studying the image modality, our analyses show – for the first time – that for all models, higher correlation with human attention on text is a significant predictor of VQA performance. This finding points at a potential for improving VQA performance and, at the same time, calls for further research on neural text attention mechanisms and their integration into architectures for vision and language tasks, including but potentially also beyond VQA.

Co-authors

Francesca Zermiani 1

Venues

Fix author