Low-Bit Quantization Favors Undertrained LLMs

Xu Ouyang; Tao Ge; Thomas Hartvigsen; Zhisong Zhang; Haitao Mi; Dong Yu (于东)

doi:10.18653/v1/2025.acl-long.1555

Low-Bit Quantization Favors Undertrained LLMs

Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu

Abstract

Low-bit quantization improves machine learning model efficiency but surprisingly favors undertrained large language models (LLMs). Larger models or those trained on fewer tokens exhibit less quantization-induced degradation (QiD), while smaller, well-trained models face significant performance losses. To gain deeper insights into this trend, we study over 1500+ quantized LLM checkpoints of various sizes and at different training levels (undertrained or fully trained) in a controlled setting, deriving scaling laws for understanding the relationship between QiD and factors: the number of training tokens, model size and bit width.With our derived scaling laws, we propose a novel perspective that we can use QiD to measure an LLM’s training levels and determine the number of training tokens required for fully training LLMs of various sizes. Moreover, we use the scaling laws to predict the quantization performance of different-sized LLMs trained with tokens. Our projection shows that the low-bit quantization performance of future models, which are expected to be trained with over \textcolor{red}{100~trillion} tokens, may NOT be desirable. This poses a potential challenge for low-bit quantization in the future and highlights the need for awareness of a model’s training level when evaluating low-bit quantization research. To facilitate future research on this problem, we release all the 1500+ quantized checkpoints used in this work at https://huggingface.co/Xu-Ouyang.

Anthology ID:: 2025.acl-long.1555
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32338–32348
Language:
URL:: https://aclanthology.org/2025.acl-long.1555/
DOI:: 10.18653/v1/2025.acl-long.1555
Bibkey:
Cite (ACL):: Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, and Dong Yu. 2025. Low-Bit Quantization Favors Undertrained LLMs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32338–32348, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Low-Bit Quantization Favors Undertrained LLMs (Ouyang et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1555.pdf

PDF Cite Search Fix data