Informativity in Image Captions vs. Referring Expressions

Elizabeth Coppock, Danielle Dionne, Nathanial Graham, Elias Ganem, Shijie Zhao, Shawn Lin, Wenxing Liu, Derry Wijaya


Abstract
At the intersection between computer vision and natural language processing, there has been recent progress on two natural language generation tasks: Dense Image Captioning and Referring Expression Generation for objects in complex scenes. The former aims to provide a caption for a specified object in a complex scene for the benefit of an interlocutor who may not be able to see it. The latter aims to produce a referring expression that will serve to identify a given object in a scene that the interlocutor can see. The two tasks are designed for different assumptions about the common ground between the interlocutors, and serve very different purposes, although they both associate a linguistic description with an object in a complex scene. Despite these fundamental differences, the distinction between these two tasks is sometimes overlooked. Here, we undertake a side-by-side comparison between image captioning and reference game human datasets and show that they differ systematically with respect to informativity. We hope that an understanding of the systematic differences among these human datasets will ultimately allow them to be leveraged more effectively in the associated engineering tasks.
Anthology ID:
2020.pam-1.14
Volume:
Proceedings of the Probability and Meaning Conference (PaM 2020)
Month:
June
Year:
2020
Address:
Gothenburg
Editors:
Christine Howes, Stergios Chatzikyriakidis, Adam Ek, Vidya Somashekarappa
Venue:
PaM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
104–108
Language:
URL:
https://aclanthology.org/2020.pam-1.14
DOI:
Bibkey:
Cite (ACL):
Elizabeth Coppock, Danielle Dionne, Nathanial Graham, Elias Ganem, Shijie Zhao, Shawn Lin, Wenxing Liu, and Derry Wijaya. 2020. Informativity in Image Captions vs. Referring Expressions. In Proceedings of the Probability and Meaning Conference (PaM 2020), pages 104–108, Gothenburg. Association for Computational Linguistics.
Cite (Informal):
Informativity in Image Captions vs. Referring Expressions (Coppock et al., PaM 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.pam-1.14.pdf
Data
Visual Genome