SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning

Zhengyang Ai; Zikang Shan; Xiaodong Ai; Jingxian Tang; Hangkai Hu; Pinyan Lu

doi:10.18653/v1/2026.acl-long.1114

SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning

Zhengyang Ai, Zikang Shan, Xiaodong Ai, Jingxian Tang, Hangkai Hu, Pinyan Lu

Abstract

Process supervision has emerged as a promising approach for enhancing LLM reasoning, yet existing methods fail to distinguish meaningful progress from mere verbosity, leading to limited reasoning capabilities and unresolved token inefficiency. To address this, we propose Stage-aware Hierarchical Advantage via Potential Estimation (SHAPE), a framework that formalizes reasoning as a trajectory through a state space of empirical solvability. SHAPE introduces a hierarchical credit assignment mechanism: at the segment level, it employs a stage-aware advantage function to prioritize efficient breakthroughs in low-potential states; at the token level, it utilizes entropy-driven redistribution to sharpen execution signals. Extensive experiments in math reasoning across three base models and five benchmarks demonstrate that SHAPE achieves an average accuracy gain of 3% with 30% reduced token consumption.

Anthology ID:: 2026.acl-long.1114
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24288–24305
Language:
URL:: https://aclanthology.org/2026.acl-long.1114/
DOI:: 10.18653/v1/2026.acl-long.1114
Bibkey:
Cite (ACL):: Zhengyang Ai, Zikang Shan, Xiaodong Ai, Jingxian Tang, Hangkai Hu, and Pinyan Lu. 2026. SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 24288–24305, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning (Ai et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1114.pdf
Checklist:: 2026.acl-long.1114.checklist.pdf

PDF Cite Search Checklist Fix data