Reinforcing Agentic Search Via Reward Density Optimization

Kun Luo; Hongjin Qian; Zheng Liu; Ziyi Xia; Shitao Xiao; Zhao Cao; Siqi Bao; Jun Zhao; Kang Liu

Reinforcing Agentic Search Via Reward Density Optimization

Kun Luo, Hongjin Qian, Zheng Liu, Ziyi Xia, Shitao Xiao, Zhao Cao, Siqi Bao, Jun Zhao, Kang Liu

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) is a promising approach for enhancing agentic search. However, its performance is often hindered by reward sparsity, whereby agents receive very limited positive feedback despite incurring significant exploration costs. In this paper, we formalize this challenge as a new research problem termed **Reward Density Optimization**, which aims to improve the reward obtained per unit of exploration cost. To address this problem, we introduce InfoFlow, a systematic framework that operates along three complementary dimensions: 1) **Sub-goal Scaffolding**: which decomposes long-horizon tasks into intermediate objectives and assigns process-level rewards to provide denser learning signals; 2) **Pathfinding Hints**: which injects corrective guidance into stalled trajectories to increase the ratio of successful trials; and 3) **Dual-agent Refinement**: which employs a dual-agent architecture to offload the cognitive burden of deep exploration. We evaluate InfoFlow on several popular agentic search benchmarks, where it significantly outperforms strong baselines and enables lightweight LLMs to achieve performance comparable to that of advanced proprietary models.

Anthology ID:: 2026.acl-long.467
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10261–10283
Language:
URL:: https://aclanthology.org/2026.acl-long.467/
DOI:
Bibkey:
Cite (ACL):: Kun Luo, Hongjin Qian, Zheng Liu, Ziyi Xia, Shitao Xiao, Zhao Cao, Siqi Bao, Jun Zhao, and Kang Liu. 2026. Reinforcing Agentic Search Via Reward Density Optimization. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10261–10283, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Reinforcing Agentic Search Via Reward Density Optimization (Luo et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.467.pdf
Checklist:: 2026.acl-long.467.checklist.pdf

PDF Cite Search Checklist Fix data