Models See Hallucinations: Evaluating the Factuality in Video Captioning

Hui Liu; Xiaojun Wan

doi:10.18653/v1/2023.emnlp-main.723

Models See Hallucinations: Evaluating the Factuality in Video Captioning

Abstract

Video captioning aims to describe events in a video with natural language. In recent years, many works have focused on improving captioning models’ performance. However, like other text generation tasks, it risks introducing factual errors not supported by the input video. Factual errors can seriously affect the quality of the generated text, sometimes making it completely unusable. Although factual consistency has received much research attention in text-to-text tasks (e.g., summarization), it is less studied in vision-based text generation. In this work, we conduct the first human evaluation of the factuality in video captioning and annotate two factuality datasets. We find that 56% of the model-generated sentences have factual errors, indicating it is a severe problem in this field, but existing evaluation metrics show little correlation with human factuality annotation. We further propose a weakly-supervised, model-based factuality metric FactVC, which outperforms previous metrics on factuality evaluation of video captioning.

Anthology ID:: 2023.emnlp-main.723
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11807–11823
Language:
URL:: https://aclanthology.org/2023.emnlp-main.723
DOI:: 10.18653/v1/2023.emnlp-main.723
Bibkey:
Cite (ACL):: Hui Liu and Xiaojun Wan. 2023. Models See Hallucinations: Evaluating the Factuality in Video Captioning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 11807–11823, Singapore. Association for Computational Linguistics.
Cite (Informal):: Models See Hallucinations: Evaluating the Factuality in Video Captioning (Liu & Wan, EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.723.pdf
Video:: https://aclanthology.org/2023.emnlp-main.723.mp4

PDF Cite Search Video