UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning

Hwanhee Lee; Seunghyun Yoon; Franck Dernoncourt; Trung Bui; Kyomin Jung

doi:10.18653/v1/2021.acl-short.29

UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning

Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Kyomin Jung

Abstract

Despite the success of various text generation metrics such as BERTScore, it is still difficult to evaluate the image captions without enough reference captions due to the diversity of the descriptions. In this paper, we introduce a new metric UMIC, an Unreferenced Metric for Image Captioning which does not require reference captions to evaluate image captions. Based on Vision-and-Language BERT, we train UMIC to discriminate negative captions via contrastive learning. Also, we observe critical problems of the previous benchmark dataset (i.e., human annotations) on image captioning metric, and introduce a new collection of human annotations on the generated captions. We validate UMIC on four datasets, including our new dataset, and show that UMIC has a higher correlation than all previous metrics that require multiple references.

Anthology ID:: 2021.acl-short.29
Volume:: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:: August
Year:: 2021
Address:: Online
Editors:: Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:: ACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 220–226
Language:
URL:: https://aclanthology.org/2021.acl-short.29
DOI:: 10.18653/v1/2021.acl-short.29
Bibkey:
Cite (ACL):: Hwanhee Lee, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, and Kyomin Jung. 2021. UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 220–226, Online. Association for Computational Linguistics.
Cite (Informal):: UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning (Lee et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.acl-short.29.pdf
Optional supplementary material:: 2021.acl-short.29.OptionalSupplementaryMaterial.pdf
Video:: https://aclanthology.org/2021.acl-short.29.mp4
Code: hwanheelee1993/UMIC

PDF Cite Search Code Optional supplementary material Video