Weizheng Lu


2024

pdf bib
Xinference: Making Large Model Serving Easy
Weizheng Lu | Lingfeng Xiong | Feng Zhang | Xuye Qin | Yueguo Chen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

The proliferation of open-source large models necessitates dedicated tools for deployment and accessibility. To mitigate the complexities of model serving, we develop Xinference, an open-source library designed to simplify the deployment and management of large models. Xinference effectively simplifies deployment complexities for users by (a) preventing users from writing code and providing built-in support for various models and OpenAI-compatible APIs; (b) enabling full model serving lifecycle management; (c) guaranteeing efficient and scalable inference and achieving high throughput and low latency. In comparative experiments with similar products like BentoML and Ray Serve, Xinference outperforms these tools and offers superior ease of use.Xinference is available at https://github.com/xorbitsai/inference.