IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs

Aosong Feng; Balasubramaniam Srinivasan; Yun Zhou; Zhichao Xu; Kang Zhou; Sheng Guan; Yueyan Chen; Xian Wu; Ninad Kulkarni; Yi Zhang; Zhengyuan Shen; Dmitriy Bespalov; Soumya Smruti Mishra; Yifei Teng; Darren Yow-Bang Wang; Haibo Ding; Lin Lee Cheong

IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs

Aosong Feng, Balasubramaniam Srinivasan, Yun Zhou, Zhichao Xu, Kang Zhou, Sheng Guan, Yueyan Chen, Xian Wu, Ninad Kulkarni, Yi Zhang, Zhengyuan Shen, Dmitriy Bespalov, Soumya Smruti Mishra, Yifei Teng, Darren Yow-Bang Wang, Haibo Ding, Lin Lee Cheong

Abstract

Routing incoming queries to the most cost-effective LLM while maintaining response quality poses a fundamental challenge in optimizing performance-cost trade-offs for large-scale commercial systems.We present IPR—a quality-constrained Intelligent Prompt Routing framework that dynamically selects optimal models based on predicted response quality and user-specified tolerance levels.IPR introduces three key innovations: (1) a modular architecture with lightweight quality estimators trained on 1.5M prompts annotated with calibrated quality scores, enabling fine-grained quality prediction across model families; (2) a user-controlled routing mechanism with tolerance parameter 𝜏 ∈ [0,1] that provides explicit control over quality-cost trade-offs; and (3) an extensible design using frozen encoders with model-specific adapters, reducing new model integration from days to hours. To rigorously train and evaluate IPR, we curate an industrial-level IPR dataset, a comprehensive benchmark containing 1.5 million examples with response quality annotations across 11 LLM candidates.Deployed on a major cloud platform, IPR achieves 43.9% cost reduction while maintaining quality parity with the strongest model in the Claude family and processes requests with sub-150ms latency.

Anthology ID:: 2025.emnlp-industry.170
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2484–2498
Language:
URL:: https://aclanthology.org/2025.emnlp-industry.170/
DOI:
Bibkey:
Cite (ACL):: Aosong Feng, Balasubramaniam Srinivasan, Yun Zhou, Zhichao Xu, Kang Zhou, Sheng Guan, Yueyan Chen, Xian Wu, Ninad Kulkarni, Yi Zhang, Zhengyuan Shen, Dmitriy Bespalov, Soumya Smruti Mishra, Yifei Teng, Darren Yow-Bang Wang, Haibo Ding, and Lin Lee Cheong. 2025. IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2484–2498, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs (Feng et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-industry.170.pdf

PDF Cite Search Fix data