MTRouter: Cost-Aware Multi-Turn LLM Routing with History–Model Joint Embeddings

Yiqun Zhang; Hao Li; Zihan Wang; Shi Feng; Xiaocui Yang; Daling Wang; Bo Zhang; Lei Bai; Shuyue Hu

MTRouter: Cost-Aware Multi-Turn LLM Routing with History–Model Joint Embeddings

Yiqun Zhang, Hao Li, Zihan Wang, Shi Feng, Xiaocui Yang, Daling Wang, Bo Zhang, Lei Bai, Shuyue Hu

Abstract

Multi-turn, long-horizon tasks are increasingly common for large language models (LLMs), but solving them typically requires many sequential model invocations, accumulating substantial inference costs. Here, we study cost-aware multi-turn LLM routing: selecting which model to invoke at each turn from a model pool, given a fixed cost budget. We propose MTRouter, which encodes the interaction history and candidate models into joint history–model embeddings, and learns an outcome estimator from logged trajectories to predict turn-level model utility. Experiments show that MTRouter improves the performance–cost trade-off: on ScienceWorld, it surpasses GPT-5 while reducing total cost by 58.7%; on Humanity’s Last Exam (HLE), it achieves competitive accuracy while reducing total cost by 43.4% relative to GPT-5, and these gains even carry over to held-out tasks. Further analyses reveal several mechanisms underlying its effectiveness: relative to prior multi-turn routers, MTRouter makes fewer model switches, is more tolerant to transient errors, and exhibits emergent specialization across models.Code: https://github.com/ZhangYiqun018/MTRouter

Anthology ID:: 2026.acl-long.2045
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 44206–44226
Language:
URL:: https://aclanthology.org/2026.acl-long.2045/
DOI:
Bibkey:
Cite (ACL):: Yiqun Zhang, Hao Li, Zihan Wang, Shi Feng, Xiaocui Yang, Daling Wang, Bo Zhang, Lei Bai, and Shuyue Hu. 2026. MTRouter: Cost-Aware Multi-Turn LLM Routing with History–Model Joint Embeddings. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 44206–44226, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MTRouter: Cost-Aware Multi-Turn LLM Routing with History–Model Joint Embeddings (Zhang et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.2045.pdf
Checklist:: 2026.acl-long.2045.checklist.pdf

PDF Cite Search Checklist Fix data