BMInf: An Efficient Toolkit for Big Model Inference and Tuning

Xu Han, Guoyang Zeng, Weilin Zhao, Zhiyuan Liu, Zhengyan Zhang, Jie Zhou, Jun Zhang, Jia Chao, Maosong Sun


Abstract
In recent years, large-scale pre-trained language models (PLMs) containing billions of parameters have achieved promising results on various NLP tasks. Although we can pre-train these big models by stacking computing clusters at any cost, it is impractical to use such huge computing resources to apply big models for each downstream task. To address the computation bottleneck encountered in deploying big models in real-world scenarios, we introduce an open-source toolkit for big model inference and tuning (BMInf), which can support big model inference and tuning at extremely low computation cost. More specifically, at the algorithm level, we introduce model quantization and parameter-efficient tuning for efficient model inference and tuning. At the implementation level, we apply model offloading, model checkpointing, and CPU-GPU scheduling optimization to further reduce the computation and memory cost of big models. Based on above efforts, we can efficiently perform big model inference and tuning with a single GPU (even a consumer-level GPU like GTX 1060) instead of computing clusters, which is difficult for existing distributed learning toolkits for PLMs. BMInf is publicly released at https://github.com/OpenBMB/BMInf.
Anthology ID:
2022.acl-demo.22
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Valerio Basile, Zornitsa Kozareva, Sanja Stajner
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
224–230
Language:
URL:
https://aclanthology.org/2022.acl-demo.22
DOI:
10.18653/v1/2022.acl-demo.22
Bibkey:
Cite (ACL):
Xu Han, Guoyang Zeng, Weilin Zhao, Zhiyuan Liu, Zhengyan Zhang, Jie Zhou, Jun Zhang, Jia Chao, and Maosong Sun. 2022. BMInf: An Efficient Toolkit for Big Model Inference and Tuning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 224–230, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
BMInf: An Efficient Toolkit for Big Model Inference and Tuning (Han et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-demo.22.pdf
Code
 openbmb/bminf