Evaluating and Improving Factuality in Multimodal Abstractive Summarization

David Wan, Mohit Bansal


Abstract
Current metrics for evaluating factuality for abstractive document summarization have achieved high correlations with human judgment, but they do not account for the vision modality and thus are not adequate for vision-and-language summarization. We propose CLIPBERTSCORE, a simple weighted combination of CLIPScore and BERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary, respectively. Next, due to the lack of meta-evaluation benchmarks to evaluate the quality of multimodal factuality metrics, we collect human judgments of factuality with respect to documents and images. We show that this simple combination of two metrics in the zero-shot setting achieves higher correlations than existing factuality metrics for document summarization, outperforms an existing multimodal summarization metric, and performs competitively with strong multimodal factuality metrics specifically fine-tuned for the task. Our thorough analysis demonstrates the robustness and high correlation of CLIPBERTSCORE and its components on four factuality metric-evaluation benchmarks. Finally, we demonstrate two practical downstream applications of our CLIPBERTSCORE metric: for selecting important images to focus on during training, and as a reward for reinforcement learning to improve factuality of multimodal summary generation w.r.t automatic and human evaluation.
Anthology ID:
2022.emnlp-main.654
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9632–9648
Language:
URL:
https://aclanthology.org/2022.emnlp-main.654
DOI:
10.18653/v1/2022.emnlp-main.654
Bibkey:
Cite (ACL):
David Wan and Mohit Bansal. 2022. Evaluating and Improving Factuality in Multimodal Abstractive Summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9632–9648, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Evaluating and Improving Factuality in Multimodal Abstractive Summarization (Wan & Bansal, EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-main.654.pdf