Dunhuang-Bench: How Well Do MLLMs Understand Cultural Heritage?

Junyi Yuan; Jian Zhang; Tianxiu Yu; Yanlin Zhou; Xiaobo Jin; Qiufeng Wang; Fangyu Wu

Dunhuang-Bench: How Well Do MLLMs Understand Cultural Heritage?

Junyi Yuan, Jian Zhang, Tianxiu Yu, Yanlin Zhou, Xiaobo Jin, Qiufeng Wang, Fangyu Wu

Abstract

Dunhuang art, a cornerstone of global heritage, demands fine-grained visual perception anchored by specialized cultural knowledge. Given the strong performance of multimodal large language models (MLLMs) on generic multimodal benchmarks, to what extent can they understand artifacts from Dunhuang art that are grounded in cultural context? To this end, we construct Dunhuang-Bench, a large-scale benchmark comprising 486 images and 22,970 QA pairs. It incorporates diverse task formats to evaluate MLLMs’ cultural understanding: Question Answering with Text Description, Multi-turn Dialogue, and Question Answering with Choices. Guided by Panofsky’s theory of iconology, we design two tasks including visual perception and knowledge reasoning for the evaluation of content understanding. In addition, we follow the theory of formal analytic tradition to design another task of artistic appreciation in our Dunhuang-Bench. Extensive evaluations of 20 mainstream MLLMs on Dunhuang-Bench reveal a consistent performance drop from perception and appreciation to reasoning. Moreover, CoT and few-shot prompting show marginal or negative impact, highlighting the limits of prompting-based improvements. Dunhuang-Bench thus provides a challenging benchmark for advancing multimodal cultural understanding. Data and code will be publicly available.

Anthology ID:: 2026.findings-acl.888
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17894–17910
Language:
URL:: https://aclanthology.org/2026.findings-acl.888/
DOI:
Bibkey:
Cite (ACL):: Junyi Yuan, Jian Zhang, Tianxiu Yu, Yanlin Zhou, Xiaobo Jin, Qiufeng Wang, and Fangyu Wu. 2026. Dunhuang-Bench: How Well Do MLLMs Understand Cultural Heritage?. In Findings of the Association for Computational Linguistics: ACL 2026, pages 17894–17910, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Dunhuang-Bench: How Well Do MLLMs Understand Cultural Heritage? (Yuan et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.888.pdf
Checklist:: 2026.findings-acl.888.checklist.pdf

PDF Cite Search Checklist Fix data