Sketch Me if You Can: Towards Generating Detailed Descriptions of Object Shape by Grounding in Images and Drawings

Ting Han; Sina Zarrieß

doi:10.18653/v1/W19-8618

Sketch Me if You Can: Towards Generating Detailed Descriptions of Object Shape by Grounding in Images and Drawings

Abstract

A lot of recent work in Language & Vision has looked at generating descriptions or referring expressions for objects in scenes of real-world images, though focusing mostly on relatively simple language like object names, color and location attributes (e.g., brown chair on the left). This paper presents work on Draw-and-Tell, a dataset of detailed descriptions for common objects in images where annotators have produced fine-grained attribute-centric expressions distinguishing a target object from a range of similar objects. Additionally, the dataset comes with hand-drawn sketches for each object. As Draw-and-Tell is medium-sized and contains a rich vocabulary, it constitutes an interesting challenge for CNN-LSTM architectures used in state-of-the-art image captioning models. We explore whether the additional modality given through sketches can help such a model to learn to accurately ground detailed language referring expressions to object shapes. Our results are encouraging.

Anthology ID:: W19-8618
Volume:: Proceedings of the 12th International Conference on Natural Language Generation
Month:: October–November
Year:: 2019
Address:: Tokyo, Japan
Editors:: Kees van Deemter, Chenghua Lin, Hiroya Takamura
Venue:: INLG
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 136–140
Language:
URL:: https://aclanthology.org/W19-8618/
DOI:: 10.18653/v1/W19-8618
Bibkey:
Cite (ACL):: Ting Han and Sina Zarrieß. 2019. Sketch Me if You Can: Towards Generating Detailed Descriptions of Object Shape by Grounding in Images and Drawings. In Proceedings of the 12th International Conference on Natural Language Generation, pages 136–140, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):: Sketch Me if You Can: Towards Generating Detailed Descriptions of Object Shape by Grounding in Images and Drawings (Han & Zarrieß, INLG 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-8618.pdf

PDF Cite Search Fix data