Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis

Hong Huang; Dapeng Wu

doi:10.18653/v1/2025.acl-long.325

Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis

Abstract

Large language models (LLMs) have made exciting achievements across various domains, yet their deployment on resource-constrained personal devices remains hindered by the prohibitive computational and memory demands of task-specific fine-tuning. While quantization offers a pathway to efficiency, existing methods struggle to balance performance and overhead, either incurring high computational/memory costs or failing to address activation outliers—a critical bottleneck in quantized fine-tuning. To address these challenges, we propose the Outlier Spatial Stability Hypothesis (__OSSH__): _During fine-tuning, certain activation outlier channels retain stable spatial positions across training iterations._ Building on OSSH, we propose __Quaff__, a Quantized parameter-efficient fine-tuning framework for LLMs, optimizing low-precision activation representations through targeted momentum scaling. Quaff dynamically suppresses outliers exclusively in invariant channels using lightweight operations, eliminating full-precision weight storage and global rescaling while reducing quantization errors. Extensive experiments across ten benchmarks validate OSSH and demonstrate Quaff’s efficacy. Specifically, on the GPQA reasoning benchmark, Quaff achieves a 1.73× latency reduction and 30% memory savings over full-precision fine-tuning while improving accuracy by 0.6% on the Phi-3 model, reconciling the triple trade-off between efficiency, performance, and deployability. By enabling consumer-grade GPU fine-tuning (e.g., RTX 2080 Super) without sacrificing model utility, Quaff democratizes personalized LLM deployment. The code is available at https://anonymous.4open.science/r/Quaff-B322/.

Anthology ID:: 2025.acl-long.325
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6481–6496
Language:
URL:: https://aclanthology.org/2025.acl-long.325/
DOI:: 10.18653/v1/2025.acl-long.325
Bibkey:
Cite (ACL):: Hong Huang and Dapeng Wu. 2025. Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6481–6496, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis (Huang & Wu, ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.325.pdf

PDF Cite Search Fix data