MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization

Yinhong Liu; Jianfeng He; Hang Su; Ruixue Lian; Yi Nian; Jake W. Vincent; Srikanth Vishnubhotla; Robinson Piramuthu; Saab Mansour

MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization

Yinhong Liu, Jianfeng He, Hang Su, Ruixue Lian, Yi Nian, Jake W. Vincent, Srikanth Vishnubhotla, Robinson Piramuthu, Saab Mansour

Abstract

Multimodal Dialogue Summarization (MDS) is a critical task with wide-ranging applications. To support the development of effective MDS models, robust automatic evaluation methods are essential for reducing both cost and human effort. However, such methods require a strong meta-evaluation benchmark grounded in human annotations. In this work, we introduce MDSEval, the first meta-evaluation benchmark for MDS, consisting image-sharing dialogues, corresponding summaries, and human judgments across eight well-defined quality aspects. To ensure data quality and richfulness, we propose a novel filtering framework leveraging Mutually Exclusive Key Information (MEKI) across modalities. Our work is the first to identify and formalize key evaluation dimensions specific to MDS. Finally, we benchmark state-of-the-art modal evaluation methods, revealing their limitations in distinguishing summaries from advanced MLLMs and their susceptibility to various bias.

Anthology ID:: 2025.findings-emnlp.794
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14707–14727
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.794/
DOI:
Bibkey:
Cite (ACL):: Yinhong Liu, Jianfeng He, Hang Su, Ruixue Lian, Yi Nian, Jake W. Vincent, Srikanth Vishnubhotla, Robinson Piramuthu, and Saab Mansour. 2025. MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 14707–14727, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization (Liu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.794.pdf
Checklist:: 2025.findings-emnlp.794.checklist.pdf

PDF Cite Search Checklist Fix data