Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning

Yilun Zhao; Guo Gan; Chengye Wang; Chen Zhao; Arman Cohan

doi:10.18653/v1/2025.naacl-long.582

Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning

Yilun Zhao, Guo Gan, Chengye Wang, Chen Zhao, Arman Cohan

Abstract

We introduce RoMMath, the first benchmark designed to evaluate the capabilities and robustness of multimodal large language models (MLLMs) in handling multimodal math reasoning, particularly when faced with adversarial perturbations. RoMMath consists of 4,800 expert-annotated examples, including an original set and seven adversarial sets, each targeting a specific type of perturbation at the text or vision levels. We evaluate a broad spectrum of 17 MLLMs on RoMMath and uncover a critical challenge regarding model robustness against adversarial perturbations. Through detailed error analysis by human experts, we gain a deeper understanding of the current limitations of MLLMs. Additionally, we explore various approaches to enhance the performance and robustness of MLLMs, providing insights that can guide future research efforts.

Anthology ID:: 2025.naacl-long.582
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11653–11665
Language:
URL:: https://aclanthology.org/2025.naacl-long.582/
DOI:: 10.18653/v1/2025.naacl-long.582
Bibkey:
Cite (ACL):: Yilun Zhao, Guo Gan, Chengye Wang, Chen Zhao, and Arman Cohan. 2025. Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 11653–11665, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning (Zhao et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-long.582.pdf

PDF Cite Search Fix data