From “Before” to “After”: Generating Natural Language Instructions from Image Pairs in a Simple Visual Domain

Robin Rojowiec; Jana Götze; Philipp Sadler; Henrik Voigt; Sina Zarrieß; David Schlangen

doi:10.18653/v1/2020.inlg-1.38

From “Before” to “After”: Generating Natural Language Instructions from Image Pairs in a Simple Visual Domain

Robin Rojowiec, Jana Götze, Philipp Sadler, Henrik Voigt, Sina Zarrieß, David Schlangen

Abstract

While certain types of instructions can be com-pactly expressed via images, there are situations where one might want to verbalise them, for example when directing someone. We investigate the task of Instruction Generation from Before/After Image Pairs which is to derive from images an instruction for effecting the implied change. For this, we make use of prior work on instruction following in a visual environment. We take an existing dataset, the BLOCKS data collected by Bisk et al. (2016) and investigate whether it is suitable for training an instruction generator as well. We find that it is, and investigate several simple baselines, taking these from the related task of image captioning. Through a series of experiments that simplify the task (by making image processing easier or completely side-stepping it; and by creating template-based targeted instructions), we investigate areas for improvement. We find that captioning models get some way towards solving the task, but have some difficulty with it, and future improvements must lie in the way the change is detected in the instruction.

Anthology ID:: 2020.inlg-1.38
Original:: 2020.inlg-1.38v1
Version 2:: 2020.inlg-1.38v2
Volume:: Proceedings of the 13th International Conference on Natural Language Generation
Month:: December
Year:: 2020
Address:: Dublin, Ireland
Editors:: Brian Davis, Yvette Graham, John Kelleher, Yaji Sripada
Venue:: INLG
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 316–326
Language:
URL:: https://aclanthology.org/2020.inlg-1.38
DOI:: 10.18653/v1/2020.inlg-1.38
Bibkey:
Cite (ACL):: Robin Rojowiec, Jana Götze, Philipp Sadler, Henrik Voigt, Sina Zarrieß, and David Schlangen. 2020. From “Before” to “After”: Generating Natural Language Instructions from Image Pairs in a Simple Visual Domain. In Proceedings of the 13th International Conference on Natural Language Generation, pages 316–326, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: From “Before” to “After”: Generating Natural Language Instructions from Image Pairs in a Simple Visual Domain (Rojowiec et al., INLG 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.inlg-1.38.pdf
Supplementary attachment:: 2020.inlg-1.38.Supplementary_Attachment.pdf

PDF (v2) PDF (v1) Cite Search Supplementary attachment