TensorOpera Router: A Multi-Model Router for Efficient LLM Inference Dimitris Stripelis author Zhaozhuo Xu author Zijian Hu author Alay Dilipbhai Shah author Han Jin author Yuhang Yao author Jipeng Zhang author Tong Zhang author Salman Avestimehr author Chaoyang He author 2024-11 text Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track Franck Dernoncourt editor Daniel Preoţiuc-Pietro editor Anastasia Shimorina editor Association for Computational Linguistics Miami, Florida, US conference publication stripelis-etal-2024-tensoropera 10.18653/v1/2024.emnlp-industry.34 https://aclanthology.org/2024.emnlp-industry.34/ 2024-11 452 462