DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping

Wei Fan; Wenlin Yao; Zheng Li; Feng Yao; Xin Liu; Liang Qiu; Qingyu Yin; Yangqiu Song; Bing Yin

DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping

Wei Fan, Wenlin Yao, Zheng Li, Feng Yao, Xin Liu, Liang Qiu, Qingyu Yin, Yangqiu Song, Bing Yin

Abstract

Large language models (LLMs) augmented with multi-step reasoning and action generation abilities have shown promise in leveraging external tools to tackle complex tasks that require long-horizon planning. However, existing approaches either rely on implicit planning in the reasoning stage or introduce explicit planners without systematically addressing how to optimize the planning stage. As evidence, we observe that under vanilla reinforcement learning (RL), planning tokens exhibit significantly higher entropy than other action tokens, revealing uncertain decision points that remain under-optimized. To address this, we introduce DeepPlanner, an end-to-end RL framework that effectively enhances the planning capabilities of deep research agents. Our approach shapes token-level advantage with an entropy-based term to allocate larger updates to high entropy tokens, and selectively upweights sample-level advantages for planning-intensive rollouts. Extensive experiments across seven deep research benchmarks demonstrate that DeepPlanner improves planning quality and achieves state-of-the-art results under a substantially lower training budget.

Anthology ID:: 2026.findings-acl.370
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7510–7525
Language:
URL:: https://aclanthology.org/2026.findings-acl.370/
DOI:
Bibkey:
Cite (ACL):: Wei Fan, Wenlin Yao, Zheng Li, Feng Yao, Xin Liu, Liang Qiu, Qingyu Yin, Yangqiu Song, and Bing Yin. 2026. DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping. In Findings of the Association for Computational Linguistics: ACL 2026, pages 7510–7525, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping (Fan et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.370.pdf
Checklist:: 2026.findings-acl.370.checklist.pdf

PDF Cite Search Checklist Fix data