Tell Me More: A Dataset of Visual Scene Description Sequences

Nikolai Ilinykh, Sina Zarrieß, David Schlangen


Abstract
We present a dataset consisting of what we call image description sequences, which are multi-sentence descriptions of the contents of an image. These descriptions were collected in a pseudo-interactive setting, where the describer was told to describe the given image to a listener who needs to identify the image within a set of images, and who successively asks for more information. As we show, this setup produced nicely structured data that, we think, will be useful for learning models capable of planning and realising such description discourses.
Anthology ID:
W19-8621
Volume:
Proceedings of the 12th International Conference on Natural Language Generation
Month:
October–November
Year:
2019
Address:
Tokyo, Japan
Editors:
Kees van Deemter, Chenghua Lin, Hiroya Takamura
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
152–157
Language:
URL:
https://aclanthology.org/W19-8621
DOI:
10.18653/v1/W19-8621
Bibkey:
Cite (ACL):
Nikolai Ilinykh, Sina Zarrieß, and David Schlangen. 2019. Tell Me More: A Dataset of Visual Scene Description Sequences. In Proceedings of the 12th International Conference on Natural Language Generation, pages 152–157, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
Tell Me More: A Dataset of Visual Scene Description Sequences (Ilinykh et al., INLG 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-8621.pdf
Data
Image Description Sequences