CRVQ: Channel-Relaxed Vector Quantization for Extreme Compression of LLMs

Yuzhuang Xu, Shiyu Ji, Qingfu Zhu, Wanxiang Che


Abstract
Powerful large language models (LLMs) are increasingly expected to be deployed with lower computational costs, enabling their capabilities on resource-constrained devices. Post-training quantization (PTQ) has emerged as a star approach to achieve this ambition, with best methods compressing weights to less than 2 bit on average. In this paper, we propose Channel-Relaxed Vector Quantization (CRVQ), a novel technique that significantly improves the performance of PTQ baselines at the cost of only minimal additional bits. This state-of-the-art extreme compression method achieves its results through two key innovations: (1) carefully selecting and reordering a very small subset of critical weight channels, and (2) leveraging extended codebooks to relax the constraint of critical channels. With our method, we demonstrate a 38.9% improvement over the current strongest sub-2-bit PTQ baseline, enabling nearer lossless 1-bit compression. Furthermore, our approach offers flexible customization of quantization bit-width and performance, providing a wider range of deployment options for diverse hardware platforms. Code and checkpoints are available at https://github.com/xuyuzhuang11/CRVQ.
Anthology ID:
2025.tacl-1.68
Volume:
Transactions of the Association for Computational Linguistics, Volume 13
Month:
Year:
2025
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
1488–1506
Language:
URL:
https://aclanthology.org/2025.tacl-1.68/
DOI:
10.1162/tacl.a.45
Bibkey:
Cite (ACL):
Yuzhuang Xu, Shiyu Ji, Qingfu Zhu, and Wanxiang Che. 2025. CRVQ: Channel-Relaxed Vector Quantization for Extreme Compression of LLMs. Transactions of the Association for Computational Linguistics, 13:1488–1506.
Cite (Informal):
CRVQ: Channel-Relaxed Vector Quantization for Extreme Compression of LLMs (Xu et al., TACL 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.tacl-1.68.pdf