ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency Yuhang Yao author Han Jin author Alay Dilipbhai Shah author Shanshan Han author Zijian Hu author Dimitris Stripelis author Yide Ran author Zhaozhuo Xu author Salman Avestimehr author Chaoyang He author 2024-11 text Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track Franck Dernoncourt editor Daniel Preoţiuc-Pietro editor Anastasia Shimorina editor Association for Computational Linguistics Miami, Florida, US conference publication yao-etal-2024-scalellm 10.18653/v1/2024.emnlp-industry.22 https://aclanthology.org/2024.emnlp-industry.22/ 2024-11 279 289