QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models Saleh Ashkboos author Ilia Markov author Elias Frantar author Tingxuan Zhong author Xincheng Wang author Jie Ren author Torsten Hoefler author Dan Alistarh author 2024-11 text Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Yaser Al-Onaizan editor Mohit Bansal editor Yun-Nung Chen editor Association for Computational Linguistics Miami, Florida, USA conference publication ashkboos-etal-2024-quik 10.18653/v1/2024.emnlp-main.197 https://aclanthology.org/2024.emnlp-main.197/ 2024-11 3355 3371