Improving Reinforcement Learning Based Image Captioning with Natural Language Prior

Tszhang Guo, Shiyu Chang, Mo Yu, Kun Bai


Abstract
Recently, Reinforcement Learning (RL) approaches have demonstrated advanced performance in image captioning by directly optimizing the metric used for testing. However, this shaped reward introduces learning biases, which reduces the readability of generated text. In addition, the large sample space makes training unstable and slow.To alleviate these issues, we propose a simple coherent solution that constrains the action space using an n-gram language prior. Quantitative and qualitative evaluations on benchmarks show that RL with the simple add-on module performs favorably against its counterpart in terms of both readability and speed of convergence. Human evaluation results show that our model is more human readable and graceful. The implementation will become publicly available upon the acceptance of the paper.
Anthology ID:
D18-1083
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Month:
October-November
Year:
2018
Address:
Brussels, Belgium
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
751–756
Language:
URL:
https://aclanthology.org/D18-1083
DOI:
10.18653/v1/D18-1083
Bibkey:
Cite (ACL):
Tszhang Guo, Shiyu Chang, Mo Yu, and Kun Bai. 2018. Improving Reinforcement Learning Based Image Captioning with Natural Language Prior. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 751–756, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Improving Reinforcement Learning Based Image Captioning with Natural Language Prior (Guo et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/D18-1083.pdf
Attachment:
 D18-1083.Attachment.zip
Code
 tgGuo15/PriorImageCaption