Fast Quiet-STaR: Thinking Without Thought Tokens

Wei Huang; Yizhe Xiong; Xin Ye; Zhijie Deng; Hui Chen; Zijia Lin; Guiguang Ding

doi:10.18653/v1/2025.findings-emnlp.1020

Fast Quiet-STaR: Thinking Without Thought Tokens

Wei Huang, Yizhe Xiong, Xin Ye, Zhijie Deng, Hui Chen, Zijia Lin, Guiguang Ding

Abstract

Large Language Models (LLMs) have achieved impressive performance across a range of natural language processing tasks. However, recent advances demonstrate that further gains—particularly in complex reasoning tasks—require more than merely scaling up model sizes or training data. One promising direction is to enable models to “think” during the reasoning process. Recently, Quiet-STaR significantly improves reasoning by generating token-level thought traces, but incurs substantial inference overhead. In this work, we propose Fast Quiet-STaR, a more efficient reasoning framework that preserves the benefits of token-level reasoning while reducing computational cost. Our method introduces a curriculum-learning-based training strategy that gradually reduces the number of thought tokens, enabling the model to internalize more abstract and concise reasoning processes. We further extend this approach to the standard Next Token Prediction (NTP) setting through reinforcement learning-based fine-tuning, resulting in Fast Quiet-STaR NTP, which eliminates the need for explicit thought token generation during inference. Experiments on four benchmark datasets with Mistral 7B and Qwen2.5 7B demonstrate that Fast Quiet-STaR consistently outperforms Quiet-STaR in terms of average accuracy under the same inference time budget. Notably, Fast Quiet-STaR NTP achieves an average accuracy improvement of 9% on Mistral 7B and 5.7% on Qwen2.5 7B, while maintaining the same inference latency.

Anthology ID:: 2025.findings-emnlp.1020
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18771–18781
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.1020/
DOI:: 10.18653/v1/2025.findings-emnlp.1020
Bibkey:
Cite (ACL):: Wei Huang, Yizhe Xiong, Xin Ye, Zhijie Deng, Hui Chen, Zijia Lin, and Guiguang Ding. 2025. Fast Quiet-STaR: Thinking Without Thought Tokens. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 18771–18781, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Fast Quiet-STaR: Thinking Without Thought Tokens (Huang et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.1020.pdf
Checklist:: 2025.findings-emnlp.1020.checklist.pdf

PDF Cite Search Checklist Fix data