SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

Chengye Wang; Yifei Shen; Zexi Kuang; Arman Cohan; Yilun Zhao

doi:10.18653/v1/2025.acl-long.420

SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

Chengye Wang, Yifei Shen, Zexi Kuang, Arman Cohan, Yilun Zhao

Abstract

We introduce SciVer, the first benchmark specifically designed to evaluate the ability of foundation models to verify claims within a multimodal scientific context.SciVer consists of 3,000 expert-annotated examples over 1,113 scientific papers, covering four subsets, each representing a common reasoning type in multimodal scientific claim verification. To enable fine-grained evaluation, each example includes expert-annotated supporting evidence.We assess the performance of 21 state-of-the-art multimodal foundation models, including o4-mini, Gemini-2.5-Flash, Llama-3.2-Vision, and Qwen2.5-VL. Our experiment reveals a substantial performance gap between these models and human experts on SciVer.Through an in-depth analysis of retrieval-augmented generation (RAG), and human-conducted error evaluations, we identify critical limitations in current open-source models, offering key insights to advance models’ comprehension and reasoning in multimodal scientific literature tasks.

Anthology ID:: 2025.acl-long.420
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8562–8579
Language:
URL:: https://aclanthology.org/2025.acl-long.420/
DOI:: 10.18653/v1/2025.acl-long.420
Bibkey:
Cite (ACL):: Chengye Wang, Yifei Shen, Zexi Kuang, Arman Cohan, and Yilun Zhao. 2025. SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8562–8579, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification (Wang et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.420.pdf

PDF Cite Search Fix data