PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension

Kun Ouyang; Yuanxin Liu; Shicheng Li; Yi Liu; Hao Zhou (昊 周); Fandong Meng; Jie Zhou; Xu Sun

doi:10.18653/v1/2025.acl-long.49

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension

Kun Ouyang, Yuanxin Liu, Shicheng Li, Yi Liu, Hao Zhou, Fandong Meng, Jie Zhou, Xu Sun

Abstract

Multimodal punchlines, which involve humor or sarcasm conveyed in image-caption pairs, are a popular way of communication on online multimedia platforms. With the rapid development of multimodal large language models (MLLMs), it is essential to assess their ability to effectively comprehend these punchlines. However, existing benchmarks on punchline comprehension suffer from three major limitations: 1) language shortcuts that allow models to solely rely on text, 2) lack of question diversity, and 3) narrow focus on a specific domain of multimodal content (e.g., cartoon). To address these limitations, we introduce a multimodal **Punch**line comprehension **Bench**mark, named **PunchBench**, which is tailored for accurate and comprehensive evaluation of punchline comprehension. To enhance the evaluation accuracy, we generate synonymous and antonymous captions by modifying original captions, which mitigates the impact of shortcuts in the captions. To provide a comprehensive evaluation, PunchBench incorporates diverse question formats and image-captions from various domains. On this basis, we conduct extensive evaluations and reveal a significant gap between state-of-the-art MLLMs and humans in punchline comprehension. To improve punchline comprehension, we propose Simple-to-Complex Chain-of-Question (SC-CoQ) strategy, enabling the models to incrementally address complicated questions by first mastering simple ones. SC-CoQ effectively enhances the performance of various MLLMs on PunchBench, surpassing in-context learning and chain-of-thought.

Anthology ID:: 2025.acl-long.49
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 986–1008
Language:
URL:: https://aclanthology.org/2025.acl-long.49/
DOI:: 10.18653/v1/2025.acl-long.49
Bibkey:
Cite (ACL):: Kun Ouyang, Yuanxin Liu, Shicheng Li, Yi Liu, Hao Zhou, Fandong Meng, Jie Zhou, and Xu Sun. 2025. PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 986–1008, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension (Ouyang et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.49.pdf

PDF Cite Search Fix data