Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

Yuan Sui; Yufei He; Tri Cao; Sophia Simeng Han; Yulin Chen; Bryan Hooi

Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

Yuan Sui, Yufei He, Tri Cao, Sophia Simeng Han, Yulin Chen, Bryan Hooi

Abstract

Large Language Models (LLMs) often struggle with computational efficiency and error propagation in multi-step reasoning tasks. While recent advancements on prompting and post-training have enabled LLMs to perform step-wise reasoning, they still tend to explore unproductive solution paths without effective backtracking or strategy adjustment. In this paper, we propose Meta-Reasoner, a new framework that empowers LLMs to “think about how to think”. It optimizes the inference process by dynamically adapting reasoning strategies in real-time. Our approach employs contextual multi-armed bandits (CMABs) to learn an adaptive policy. It learns to evaluate the current state of LLM’s reasoning and determine optimal strategy that is most likely to lead to a successful outcome during inference, like whether to backtrack, switch to a new approach, or restart the problem-solving process. This meta-guidance helps avoid unproductive paths exploration during inference and hence improves computational efficiency. We evaluate Meta-Reasoner on math problems (e.g., Game-of-24, TheoremQA) and scientific tasks (e.g., SciBench). Results show that our method outperform previous SOTA methods by 9-12% in accuracy, while reducing inference time by 28-35% under the same compute budget. Additional experiments on creative writing demonstrate the generalizability of our approach to diverse reasoning-intensive tasks.

Anthology ID:: 2026.findings-acl.649
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13268–13286
Language:
URL:: https://aclanthology.org/2026.findings-acl.649/
DOI:
Bibkey:
Cite (ACL):: Yuan Sui, Yufei He, Tri Cao, Sophia Simeng Han, Yulin Chen, and Bryan Hooi. 2026. Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 13268–13286, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models (Sui et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.649.pdf
Checklist:: 2026.findings-acl.649.checklist.pdf

PDF Cite Search Checklist Fix data