Xinference: Making Large Model Serving Easy

Weizheng Lu, Lingfeng Xiong, Feng Zhang, Xuye Qin, Yueguo Chen


Abstract
The proliferation of open-source large models necessitates dedicated tools for deployment and accessibility. To mitigate the complexities of model serving, we develop Xinference, an open-source library designed to simplify the deployment and management of large models. Xinference effectively simplifies deployment complexities for users by (a) preventing users from writing code and providing built-in support for various models and OpenAI-compatible APIs; (b) enabling full model serving lifecycle management; (c) guaranteeing efficient and scalable inference and achieving high throughput and low latency. In comparative experiments with similar products like BentoML and Ray Serve, Xinference outperforms these tools and offers superior ease of use.Xinference is available at https://github.com/xorbitsai/inference.
Anthology ID:
2024.emnlp-demo.30
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Delia Irazu Hernandez Farias, Tom Hope, Manling Li
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
291–300
Language:
URL:
https://aclanthology.org/2024.emnlp-demo.30
DOI:
Bibkey:
Cite (ACL):
Weizheng Lu, Lingfeng Xiong, Feng Zhang, Xuye Qin, and Yueguo Chen. 2024. Xinference: Making Large Model Serving Easy. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 291–300, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Xinference: Making Large Model Serving Easy (Lu et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-demo.30.pdf