Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance in Adaptation

Ao Shen; Zhiquan Lai; Qiang Wang; Xionglve Li; Lizhi Zhang; Dongsheng Li; Jiaxin Li

doi:10.1162/tacl.a.23

Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance in Adaptation

Ao Shen, Zhiquan Lai, Qiang Wang, Xionglve Li, Lizhi Zhang, Dongsheng Li, Jiaxin Li

Abstract

Large Language Models (LLMs) have demonstrated impressive performance across various domains. However, the enormous number of model parameters makes fine-tuning challenging, significantly limiting their application and deployment. Existing solutions combine parameter quantization with Low-Rank Adaptation (LoRA), reducing memory usage but causing performance degradation. Additionally, converting fine-tuned models to low-precision representations further degrades performance. In this paper, we identify an imbalance in fine-tuning quantized LLMs with LoRA: overly complex adapter inputs and outputs versus low effective trainability of the adapter, leading to underfitting during fine-tuning. Thus, we propose Quantized LLMs fine-tuning with Balanced Low-Rank Adaptation (Q-BLoRA), which simplifies the adapter’s inputs and outputs while increasing the adapter’s rank to alleviate underfitting during fine-tuning. For low-precision deployment, we propose Quantization-Aware fine-tuning with Balanced Low-Rank Adaptation (QA-BLoRA), which aligns with the block-wise quantization and facilitates quantization-aware fine-tuning of low-rank adaptation based on the parameter merging of Q-BLoRA. Both Q-BLoRA and QA-BLoRA are easily implemented and offer the following optimizations: (i) Q-BLoRA consistently achieves state-of-the-art accuracy compared to baselines and other variants; (ii) QA-BLoRA enables the direct generation of low-precision inference models, which exhibit significant performance improvements over other low-precision models. We validate the effectiveness of Q-BLoRA and QA-BLoRA across various models and scenarios. Code has been made available at https://github.com/xiaocaigou/qbaraqahira.

Anthology ID:: 2025.tacl-1.40
Volume:: Transactions of the Association for Computational Linguistics, Volume 13
Month:
Year:: 2025
Address:: Cambridge, MA
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 861–877
Language:
URL:: https://aclanthology.org/2025.tacl-1.40/
DOI:: 10.1162/tacl.a.23
Bibkey:
Cite (ACL):: Ao Shen, Zhiquan Lai, Qiang Wang, Xionglve Li, Lizhi Zhang, Dongsheng Li, and Jiaxin Li. 2025. Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance in Adaptation. Transactions of the Association for Computational Linguistics, 13:861–877.
Cite (Informal):: Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance in Adaptation (Shen et al., TACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.tacl-1.40.pdf

PDF Cite Search Fix data