Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

Qimin Zhong; Hao Liao; Haiming Qin; Mingyang Zhou; Rui Mao; Wei Chen; Naipeng Chao

Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

Qimin Zhong, Hao Liao, Haiming Qin, Mingyang Zhou, Rui Mao, Wei Chen, Naipeng Chao

Abstract

Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical perspective analyzing the gradient inductive bias of MTP, supported by empirical evidence, showing that MTP promotes the convergence toward internal belief states by inducing representational contractivity via gradient coupling. However, we reveal that standard MTP often suffers from structural hallucinations, where discrete token supervision encourages illegal shortcuts in latent space that violate environmental constraints. To address this, we propose a novel method **Latent Semantic Enhancement MTP (LSE-MTP)**, which anchors predictions to ground-truth hidden state trajectories. Experiments on synthetic graphs and real-world Manhattan Taxi Ride show that LSE-MTP effectively bridges the gap between discrete tokens and continuous state representations, enhancing representation alignment, reducing structural hallucinations, and improving robustness to perturbations.

Anthology ID:: 2026.acl-long.618
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13582–13602
Language:
URL:: https://aclanthology.org/2026.acl-long.618/
DOI:
Bibkey:
Cite (ACL):: Qimin Zhong, Hao Liao, Haiming Qin, Mingyang Zhou, Rui Mao, Wei Chen, and Naipeng Chao. 2026. Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13582–13602, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement (Zhong et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.618.pdf
Checklist:: 2026.acl-long.618.checklist.pdf

PDF Cite Search Checklist Fix data