Parallel Test-Time Scaling for Latent Reasoning Models

Runyang You; Yongqi Li; Meng Liu; Wenjie Wang; Liqiang Nie; Wenjie Li

Parallel Test-Time Scaling for Latent Reasoning Models

Runyang You, Yongqi Li, Meng Liu, Wenjie Wang, Liqiang Nie, Wenjie Li

Abstract

Parallel test-time scaling (TTS) is a pivotal approach for enhancing large language models (LLMs), typically by sampling multiple token-based chains-of-thought in parallel and aggregating outcomes through voting or search. Recent advances in latent reasoning, where intermediate reasoning unfolds in continuous vector spaces, offer a more efficient alternative to explicit Chain-of-Thought, yet whether such latent models can similarly benefit from parallel TTS remains open, mainly due to the absence of sampling mechanisms in continuous space, and the lack of probabilistic signals for advanced trajectory aggregation. This work enables parallel TTS for latent reasoning models by addressing the above issues. For sampling, we introduce two uncertainty-inspired stochastic strategies: Monte Carlo Dropout and Additive Gaussian Noise. For aggregation, we design a Latent Reward Model (LatentRM) trained with step-wise contrastive objective to score and guide latent reasoning. Extensive experiments and visualization analyses show that both sampling strategies scale effectively with compute and exhibit distinct exploration dynamics, while LatentRM enables effective trajectory selection. Together, our explorations open a new direction for scalable inference in continuous spaces. Code and checkpoint are included as supplementary materials.GitHub Project: https://github.com/ModalityDance/LatentTTS

Anthology ID:: 2026.acl-long.2069
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 44703–44717
Language:
URL:: https://aclanthology.org/2026.acl-long.2069/
DOI:
Bibkey:
Cite (ACL):: Runyang You, Yongqi Li, Meng Liu, Wenjie Wang, Liqiang Nie, and Wenjie Li. 2026. Parallel Test-Time Scaling for Latent Reasoning Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 44703–44717, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Parallel Test-Time Scaling for Latent Reasoning Models (You et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.2069.pdf
Checklist:: 2026.acl-long.2069.checklist.pdf

PDF Cite Search Checklist Fix data