Obj2Text: Generating Visually Descriptive Language from Object Layouts

Xuwang Yin; Vicente Ordonez

doi:10.18653/v1/D17-1017

Obj2Text: Generating Visually Descriptive Language from Object Layouts

Abstract

Generating captions for images is a task that has recently received considerable attention. Another type of visual inputs are abstract scenes or object layouts where the only information provided is a set of objects and their locations. This type of imagery is commonly found in many applications in computer graphics, virtual reality, and storyboarding. We explore in this paper OBJ2TEXT, a sequence-to-sequence model that encodes a set of objects and their locations as an input sequence using an LSTM network, and decodes this representation using an LSTM language model. We show in our paper that this model despite using a sequence encoder can effectively represent complex spatial object-object relationships and produce descriptions that are globally coherent and semantically relevant. We test our approach for the task of describing object layouts in the MS-COCO dataset by producing sentences given only object annotations. We additionally show that our model combined with a state-of-the-art object detector can improve the accuracy of an image captioning model.

Anthology ID:: D17-1017
Volume:: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:: September
Year:: 2017
Address:: Copenhagen, Denmark
Editors:: Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:: EMNLP
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 177–187
Language:
URL:: https://aclanthology.org/D17-1017/
DOI:: 10.18653/v1/D17-1017
Bibkey:
Cite (ACL):: Xuwang Yin and Vicente Ordonez. 2017. Obj2Text: Generating Visually Descriptive Language from Object Layouts. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 177–187, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):: Obj2Text: Generating Visually Descriptive Language from Object Layouts (Yin & Ordonez, EMNLP 2017)
Copy Citation:
PDF:: https://aclanthology.org/D17-1017.pdf
Video:: https://aclanthology.org/D17-1017.mp4
Data: MS COCO

PDF Cite Search Video Fix data