Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data

Xuanming Zhang; Shwan Ashrafi; Aziza Mirsaidova; Amir H. Rezaeian; Miguel Ballesteros; Lydia Chilton; Zhou Yu; Dan Roth

Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data

Xuanming Zhang, Shwan Ashrafi, Aziza Mirsaidova, Amir H. Rezaeian, Miguel Ballesteros, Lydia Chilton, Zhou Yu, Dan Roth

Abstract

We study the reasoning behavior of large language models (LLMs) under limited computation budgets. In such settings, producing useful partial solutions quickly is often more practical than exhaustive reasoning, which incurs high inference costs. Many real-world tasks, such as trip planning, require models to deliver the best possible output within a fixed reasoning budget. We introduce an anytime reasoning framework and the Anytime Index, a metric that quantifies how effectively solution quality improves as reasoning tokens increase. To further enhance efficiency, we propose an inference-time self-improvement method using LLM-synthesized preference data, where models learn from their own reasoning comparisons to produce better intermediate solutions. Experiments on NaturalPlan (Trip), AIME, and GPQA datasets show consistent gains across Grok-3, GPT-oss, GPT-4.1/4o, and LLaMA models, improving both reasoning quality and efficiency under budget constraints.

Anthology ID:: 2026.findings-acl.417
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8587–8599
Language:
URL:: https://aclanthology.org/2026.findings-acl.417/
DOI:
Bibkey:
Cite (ACL):: Xuanming Zhang, Shwan Ashrafi, Aziza Mirsaidova, Amir H. Rezaeian, Miguel Ballesteros, Lydia Chilton, Zhou Yu, and Dan Roth. 2026. Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data. In Findings of the Association for Computational Linguistics: ACL 2026, pages 8587–8599, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data (Zhang et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.417.pdf
Checklist:: 2026.findings-acl.417.checklist.pdf

PDF Cite Search Checklist Fix data