VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery

Jinchao Ge; Tengfei Cheng; Biao Wu; Zeyu Zhang; Shiya Huang; Judith Bishop; Gillian Shepherd; Meng Fang; Ling Chen; Yang Zhao

VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery

Jinchao Ge, Tengfei Cheng, Biao Wu, Zeyu Zhang, Shiya Huang, Judith Bishop, Gillian Shepherd, Meng Fang, Ling Chen, Yang Zhao

Abstract

Understanding cultural heritage artifacts such as ancient Greek pottery requires expert-level reasoning that remains challenging for current MLLMs due to limited domain-specific data. We introduce VaseVQA, a benchmark for ancient Greek pottery, primarily vases, consisting of 31,773 images and 67,614 question–answer pairs across seven expert-defined categories, enabling systematic evaluation of expert-level cultural heritage understanding. Using this dataset, we explore effective training strategies for domain-specific reasoning. While supervised fine-tuning improves adaptation to domain knowledge, it struggles with deeper reasoning tasks. We propose VaseVL, which augments SFT with reinforcement learning using verifiable rewards. Experiments show that VaseVL consistently outperforms supervised baselines, especially on reasoning-intensive questions, highlighting the value of targeted reinforcement learning for cultural heritage visual question answering.

Anthology ID:: 2026.findings-eacl.60
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1154–1167
Language:
URL:: https://aclanthology.org/2026.findings-eacl.60/
DOI:
Bibkey:
Cite (ACL):: Jinchao Ge, Tengfei Cheng, Biao Wu, Zeyu Zhang, Shiya Huang, Judith Bishop, Gillian Shepherd, Meng Fang, Ling Chen, and Yang Zhao. 2026. VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery. In Findings of the Association for Computational Linguistics: EACL 2026, pages 1154–1167, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery (Ge et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-eacl.60.pdf
Checklist:: 2026.findings-eacl.60.checklist.pdf

PDF Cite Search Checklist Fix data