ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code

Jian Xie; Zhendong Chu; Aoxiao Zhong; Kai Zhang; Mingzhe Han; Xing Fan; Jialie Shen; Qingsong Wen

ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code

Jian Xie, Zhendong Chu, Aoxiao Zhong, Kai Zhang, Mingzhe Han, Xing Fan, Jialie Shen, Qingsong Wen

Abstract

Large Reasoning Models (LRMs) often suffer from the “over-thinking” problem, generating unnecessarily long reasoning on simple tasks. Some strategies have been proposed to mitigate this issue, such as length penalties or routing mechanisms, but they are typically heuristic and task-specific, lacking a general framework for adaptive reasoning. In this paper, we present ARM2, a unified model that adaptively balances reasoning performance and efficiency across multiple formats through a reinforcement learning framework augmented with length-aware optimization. Beyond conventional natural language inference, ARM2 integrates vision understanding, extending its applicability to multimodal. Moreover, ARM2 integrates executable code into reasoning, enabling substantial reductions in token cost while preserving task performance compared to long CoT. Experiments demonstrate that ARM2 achieves performance on par with traditional reasoning models trained with GRPO, while reducing token usage by over 70% on average. We further conduct extensive analyses to validate the effectiveness of ARM2 and the soundness of its design.

Anthology ID:: 2026.findings-acl.1365
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27392–27405
Language:
URL:: https://aclanthology.org/2026.findings-acl.1365/
DOI:
Bibkey:
Cite (ACL):: Jian Xie, Zhendong Chu, Aoxiao Zhong, Kai Zhang, Mingzhe Han, Xing Fan, Jialie Shen, and Qingsong Wen. 2026. ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code. In Findings of the Association for Computational Linguistics: ACL 2026, pages 27392–27405, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code (Xie et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1365.pdf
Checklist:: 2026.findings-acl.1365.checklist.pdf

PDF Cite Search Checklist Fix data