One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement

Yixiao Zhou; Dongzhou Cheng; Zhiliang Wu; Yi Yang; Yu Cheng; Hehe Fan

One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement

Yixiao Zhou, Dongzhou Cheng, Zhiliang wu, Yi Yang, Yu Cheng, Hehe Fan

Abstract

Large Language Models (LLMs) often fail to utilize their latent reasoning capabilities due to a distributional mismatch between ambiguous human inquiries and the structured logic required for machine activation. Existing alignment methods either incur prohibitive O(N) costs by fine-tuning each model individually or rely on static prompts that fail to resolve query-level structural complexity. In this paper, we propose **ReQueR** (**Re**inforcement **Que**ry **R**efinement), a modular framework that treats reasoning elicitation as an inference-time alignment task. We train a specialized Refiner policy via Reinforcement Learning to rewrite raw queries into explicit logical decompositions, treating frozen LLMs as the environment. Rooted in the classical Zone of Proximal Development from educational psychology, we introduce the Adaptive Solver Hierarchy, a curriculum mechanism that stabilizes training by dynamically aligning environmental difficulty with the Refiner’s evolving competence. ReQueR yields consistent absolute gains of 1.3%–7.2% across diverse architectures and benchmarks, outperforming strong baselines by 2.1% on average. Crucially, it provides a promising paradigm for one-to-many inference-time reasoning elicitation, enabling a single Refiner trained on a small set of models to effectively unlock reasoning in diverse unseen Solvers. Code is available at https://github.com/newera-xiao/ReQueR.

Anthology ID:: 2026.acl-long.1807
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 38957–38978
Language:
URL:: https://aclanthology.org/2026.acl-long.1807/
DOI:
Bibkey:
Cite (ACL):: Yixiao Zhou, Dongzhou Cheng, Zhiliang wu, Yi Yang, Yu Cheng, and Hehe Fan. 2026. One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 38957–38978, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement (Zhou et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1807.pdf
Checklist:: 2026.acl-long.1807.checklist.pdf

PDF Cite Search Checklist Fix data