Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition

Yoonjun Cho; Soeun Kim; Dongjae Jeon; Kyelim Lee; Beomsoo Lee; Albert No

doi:10.18653/v1/2025.findings-acl.746

Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition

Yoonjun Cho, Soeun Kim, Dongjae Jeon, Kyelim Lee, Beomsoo Lee, Albert No

Abstract

Decomposing weight matrices into quantization and low-rank components ( W≈ Q+LR) is a widely used technique for compressing large language models (LLMs). Existing joint optimization methods iteratively alternate between quantization and low-rank approximation. However, these methods tend to prioritize one component at the expense of the other, resulting in suboptimal decompositions that fail to leverage each component’s unique strengths. In this work, we introduce Outlier-Driven Low-Rank Initialization (ODLRI), which assigns low-rank components the specific role of capturing activation-sensitive weights. This structured decomposition mitigates outliers’ negative impact on quantization, enabling more effective balance between quantization and low-rank approximation. Experiments on Llama2 (7B, 13B, 70B), Llama3-8B, and Mistral-7B demonstrate that incorporating ODLRI into the joint optimization framework consistently reduces activation-aware error, minimizes quantization scale, and improves perplexity and zero-shot accuracy in low-bit settings.

Anthology ID:: 2025.findings-acl.746
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14453–14470
Language:
URL:: https://aclanthology.org/2025.findings-acl.746/
DOI:: 10.18653/v1/2025.findings-acl.746
Bibkey:
Cite (ACL):: Yoonjun Cho, Soeun Kim, Dongjae Jeon, Kyelim Lee, Beomsoo Lee, and Albert No. 2025. Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14453–14470, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition (Cho et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.746.pdf

PDF Cite Search Fix data