Think Better, Not Longer: Token-Level Marginal Utility for Efficient Reasoning in Large Reasoning Models

Jiawei Li; Yang Gao; Huashan Sun; Chong Feng (冯冲)

Think Better, Not Longer: Token-Level Marginal Utility for Efficient Reasoning in Large Reasoning Models

Jiawei Li, Yang Gao, Huashan Sun, Chong Feng

Abstract

While Large Reasoning Models (LRMs) have demonstrated remarkable capabilities through explicit Chain-of-Thought (CoT) generation, they frequently suffer from “overthinking”. In this work, we bridge this gap by introducing Token-level Marginal Utility, which quantifies the per-token log-probability gain of the ground-truth answer. Leveraging this dense supervision signal, we propose MUTO (Marginal Utility Guided Thinking Optimization), a unified training framework designed to synthesize concise reasoning chains. Rather than relying only on coarse trajectory-level length control, MUTO identifies tokens that reduce the model’s likelihood of the correct answer and penalizes such negative-utility reasoning, yielding concise yet effective CoT trajectories. Experiments on DeepSeek-R1-Distill-Qwen backbones (1.5B and 7B) across six math reasoning benchmarks show that MUTO yields a markedly better efficiency-accuracy Pareto frontier. It reduces average token usage by 87.1% at 1.5B while improving accuracy by 2.3%, and cuts tokens by 80.2% at 7B with only -0.1% accuracy change, achieving the best length-normalized accuracy among baselines.

Anthology ID:: 2026.acl-long.1386
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 30052–30063
Language:
URL:: https://aclanthology.org/2026.acl-long.1386/
DOI:
Bibkey:
Cite (ACL):: Jiawei Li, Yang Gao, Huashan Sun, and Chong Feng. 2026. Think Better, Not Longer: Token-Level Marginal Utility for Efficient Reasoning in Large Reasoning Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30052–30063, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Think Better, Not Longer: Token-Level Marginal Utility for Efficient Reasoning in Large Reasoning Models (Li et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1386.pdf
Checklist:: 2026.acl-long.1386.checklist.pdf

PDF Cite Search Checklist Fix data