Leveraging Partial Dependency Trees to Control Image Captions

Wenjie Zhong, Yusuke Miyao


Abstract
Controlling the generation of image captions attracts lots of attention recently. In this paper, we propose a framework leveraging partial syntactic dependency trees as control signals to make image captions include specified words and their syntactic structures. To achieve this purpose, we propose a Syntactic Dependency Structure Aware Model (SDSAM), which explicitly learns to generate the syntactic structures of image captions to include given partial dependency trees. In addition, we come up with a metric to evaluate how many specified words and their syntactic dependencies are included in generated captions. We carry out experiments on two standard datasets: Microsoft COCO and Flickr30k. Empirical results show that image captions generated by our model are effectively controlled in terms of specified words and their syntactic structures. The code is available on GitHub.
Anthology ID:
2021.alvr-1.3
Volume:
Proceedings of the Second Workshop on Advances in Language and Vision Research
Month:
June
Year:
2021
Address:
Online
Editors:
Xin, Ronghang Hu, Drew Hudson, Tsu-Jui Fu, Marcus Rohrbach, Daniel Fried
Venue:
ALVR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16–21
Language:
URL:
https://aclanthology.org/2021.alvr-1.3
DOI:
10.18653/v1/2021.alvr-1.3
Bibkey:
Cite (ACL):
Wenjie Zhong and Yusuke Miyao. 2021. Leveraging Partial Dependency Trees to Control Image Captions. In Proceedings of the Second Workshop on Advances in Language and Vision Research, pages 16–21, Online. Association for Computational Linguistics.
Cite (Informal):
Leveraging Partial Dependency Trees to Control Image Captions (Zhong & Miyao, ALVR 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.alvr-1.3.pdf
Data
Flickr30k