Weiwei Zhang
2024
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Wenhua Cheng
|
Weiwei Zhang
|
Haihao Shen
|
Yiyang Cai
|
Xin He
|
Lv Kaokao
|
Yi Liu
Findings of the Association for Computational Linguistics: EMNLP 2024
Large Language Models (LLMs) have demonstrated exceptional proficiency in language-related tasks, but their deployment poses significant challenges due to substantial memory and storage requirements. Weight-only quantization has emerged as a promising solution to address these challenges. Previous research suggests that fine-tuning through up and down rounding can enhance performance. In this study, we introduce SignRound, a method that utilizes signed gradient descent (SignSGD) to optimize rounding values and weight clipping within just 200 steps. SignRound integrates the advantages of Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ), achieving exceptional results across 2 to 4 bits while maintaining low tuning costs and avoiding additional inference overhead. For example, SignRound achieves absolute average accuracy improvements ranging from 6.91% to 33.22% at 2 bits, as measured by the average zero-shot accuracy across 11 tasks. It also demonstrates strong generalization to recent models, achieving near-lossless 4-bit quantization in most scenarios. The source code will be made publicly available.