SNAG: Spoken Narratives and Gaze Dataset

Preethi Vaidyanathan, Emily T. Prud’hommeaux, Jeff B. Pelz, Cecilia O. Alm


Abstract
Humans rely on multiple sensory modalities when examining and reasoning over images. In this paper, we describe a new multimodal dataset that consists of gaze measurements and spoken descriptions collected in parallel during an image inspection task. The task was performed by multiple participants on 100 general-domain images showing everyday objects and activities. We demonstrate the usefulness of the dataset by applying an existing visual-linguistic data fusion framework in order to label important image regions with appropriate linguistic labels.
Anthology ID:
P18-2022
Volume:
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Iryna Gurevych, Yusuke Miyao
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
132–137
Language:
URL:
https://aclanthology.org/P18-2022
DOI:
10.18653/v1/P18-2022
Bibkey:
Cite (ACL):
Preethi Vaidyanathan, Emily T. Prud’hommeaux, Jeff B. Pelz, and Cecilia O. Alm. 2018. SNAG: Spoken Narratives and Gaze Dataset. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 132–137, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
SNAG: Spoken Narratives and Gaze Dataset (Vaidyanathan et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/P18-2022.pdf
Poster:
 P18-2022.Poster.pdf
Data
MS COCO