Eye4Ref: A Multimodal Eye Movement Dataset of Referentially Complex Situations

Özge Alacam, Eugen Ruppert, Amr Rekaby Salama, Tobias Staron, Wolfgang Menzel


Abstract
Eye4Ref is a rich multimodal dataset of eye-movement recordings collected from referentially complex situated settings where the linguistic utterances and their visual referential world were available to the listener. It consists of not only fixation parameters but also saccadic movement parameters that are time-locked to accompanying German utterances (with English translations). Additionally, it also contains symbolic knowledge (contextual) representations of the images to map the referring expressions onto the objects in corresponding images. Overall, the data was collected from 62 participants in three different experimental setups (86 systematically controlled sentence–image pairs and 1844 eye-movement recordings). Referential complexity was controlled by visual manipulations (e.g. number of objects, visibility of the target items, etc.), and by linguistic manipulations (e.g., the position of the disambiguating word in a sentence). This multimodal dataset, in which the three different sources of information namely eye-tracking, language, and visual environment are aligned, offers a test of various research questions not from only language perspective but also computer vision.
Anthology ID:
2020.lrec-1.292
Volume:
Proceedings of the 12th Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2396–2404
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.292
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.292.pdf