Image Caption Generation for News Articles

Zhishen Yang, Naoaki Okazaki


Abstract
In this paper, we address the task of news-image captioning, which generates a description of an image given the image and its article body as input. This task is more challenging than the conventional image captioning, because it requires a joint understanding of image and text. We present a Transformer model that integrates text and image modalities and attends to textual features from visual features in generating a caption. Experiments based on automatic evaluation metrics and human evaluation show that an article text provides primary information to reproduce news-image captions written by journalists. The results also demonstrate that the proposed model outperforms the state-of-the-art model. In addition, we also confirm that visual features contribute to improving the quality of news-image captions.
Anthology ID:
2020.coling-main.176
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1941–1951
Language:
URL:
https://aclanthology.org/2020.coling-main.176
DOI:
10.18653/v1/2020.coling-main.176
Bibkey:
Cite (ACL):
Zhishen Yang and Naoaki Okazaki. 2020. Image Caption Generation for News Articles. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1941–1951, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Image Caption Generation for News Articles (Yang & Okazaki, COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.176.pdf
Code
 nlp-titech/news_image_captioning_for_news_articles
Data
Places