Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Zichuan Fu; Xian Wu; Guojing Li; Yejing Wang; Yijun Chen; Zhao Zihao; Luo Yixuan; Hanyu Yan; Yefeng Zheng; Xiangyu Zhao

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Zichuan Fu, Xian Wu, Guojing Li, Yejing Wang, Yijun Chen, Zhao Zihao, Luo Yixuan, Hanyu Yan, Yefeng Zheng, Xiangyu Zhao

Abstract

Recent advancements in large language models (LLMs) have catalyzed the rise of reasoningintensive inference paradigms, where models perform explicit step-by-step reasoning before generating final answers. While such approaches improve answer quality and interpretability, they incur substantial computational overhead due to the prolonged generation sequences. In this paper, we propose Tandem, a novel collaborative framework that synergizes large and small language models (LLMs and SLMs) to achieve high-quality reasoning with significantly reduced computational cost. Specifically, the LLM serves as a strategic coordinator, efficiently generating a compact set of critical reasoning insights. These insights are then used to guide a smaller, more efficient SLM in executing the full reasoning process and delivering the final response. To balance efficiency and reliability, Tandem introduces a cost-aware termination mechanism that adaptively determines when sufficient reasoning guidance has been accumulated, enabling early stopping of the LLM’s generation. Experiments on mathematical reasoning and code generation benchmarks demonstrate that Tandem reduces computational costs by approximately 40% compared to standalone LLM reasoning, while achieving superior or competitive performance. Furthermore, the sufficiency classifier trained on one domain transfers effectively to others without retraining. The code is available at: https://github.com/Applied-MachineLearning-Lab/ACL2026_Tandem.

Anthology ID:: 2026.findings-acl.2098
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 42286–42302
Language:
URL:: https://aclanthology.org/2026.findings-acl.2098/
DOI:
Bibkey:
Cite (ACL):: Zichuan Fu, Xian Wu, Guojing Li, Yejing Wang, Yijun Chen, Zhao Zihao, Luo Yixuan, Hanyu Yan, Yefeng Zheng, and Xiangyu Zhao. 2026. Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 42286–42302, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning (Fu et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.2098.pdf
Checklist:: 2026.findings-acl.2098.checklist.pdf

PDF Cite Search Checklist Fix data