Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

Yubo Wang; Juntian Zhang; Yichen Wu; Yankai Lin (林衍凯); Nils Lukas; Yuhan Liu

Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

Yubo Wang, Juntian Zhang, Yichen Wu, Yankai Lin, Nils Lukas, Yuhan Liu

Abstract

While Chain-of-Thought empowers Large Vision-Language Models with multi-step reasoning, explicit textual rationales suffer from an information bandwidth bottleneck, where continuous visual details are discarded during discrete tokenization. Recent latent reasoning methods attempt to address this challenge, but often fall prey to premature semantic collapse due to rigid autoregressive objectives. In this paper, we propose Laser, a novel paradigm that reformulates visual deduction via Dynamic Windowed Alignment Learning. Instead of forcing a point-wise prediction, Laser aligns the latent state with a dynamic validity window of future semantics. This mechanism enforces a "Forest-before-Trees" cognitive hierarchy, enabling the model to maintain a probabilistic superposition of global features before narrowing down to local details. Crucially, Laser maintains interpretability via decodable trajectories while stabilizing unconstrained learning via Self-Refined Superposition. Extensive experiments on 6 benchmarks demonstrate that Laser achieves state-of-the-art performance among latent reasoning methods, surpassing the strong baseline Monet by 5.03% on average. Notably, it achieves these gains with extreme efficiency, reducing inference tokens by more than 97%, while demonstrating robust generalization to out-of-distribution domains.

Anthology ID:: 2026.acl-long.517
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11272–11288
Language:
URL:: https://aclanthology.org/2026.acl-long.517/
DOI:
Bibkey:
Cite (ACL):: Yubo Wang, Juntian Zhang, Yichen Wu, Yankai Lin, Nils Lukas, and Yuhan Liu. 2026. Forest Before Trees: Latent Superposition for Efficient Visual Reasoning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11272–11288, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Forest Before Trees: Latent Superposition for Efficient Visual Reasoning (Wang et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.517.pdf
Checklist:: 2026.acl-long.517.checklist.pdf

PDF Cite Search Checklist Fix data