Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic

Yichuan Ma; Linyang Li; Yongkang Chen; Peiji Li; Xiaozhe Li; Qipeng Guo; Dahua Lin; Kai Chen

Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic

Yichuan Ma, Linyang Li, Yongkang Chen, Peiji Li, Xiaozhe Li, Qipeng Guo, Dahua Lin, Kai Chen

Abstract

As large language models (LLMs) increasingly tackle complex reasoning tasks, test-time scaling has become critical for enhancing capabilities. However, in agentic scenarios with frequent tool calls, the traditional generation-length-based definition breaks down: tool latency decouples inference time from generation length. We propose Timely Machine, redefining test-time as wall-clock time, where models dynamically adjust strategies based on time budgets. We introduce Timely-Eval, a benchmark spanning high-frequency tool calls, low-frequency tool calls, and time-constrained reasoning. By varying tool latency, we find smaller models excel with fast feedback through more interactions, while larger models dominate high-latency settings via superior interaction quality. Moreover, existing models fail to adapt reasoning to time budgets. We propose Timely-RL to address this gap. After cold-start supervised fine-tuning, we use reinforcement learning to enhance temporal planning. Timely-RL improves time budget awareness and consistently boosts performance across Timely-Eval. We hope our work offers a new perspective on test-time scaling for the agentic era.

Anthology ID:: 2026.acl-long.211
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4619–4636
Language:
URL:: https://aclanthology.org/2026.acl-long.211/
DOI:
Bibkey:
Cite (ACL):: Yichuan Ma, Linyang Li, Yongkang Chen, Peiji Li, Xiaozhe Li, Qipeng Guo, Dahua Lin, and Kai Chen. 2026. Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4619–4636, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Timely Machine: Awareness of Time Makes Test-Time Scaling Agentic (Ma et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.211.pdf
Checklist:: 2026.acl-long.211.checklist.pdf

PDF Cite Search Checklist Fix data