Grouped Adaptive Weight Sharing (GAWS): An Inference-Efficient Adaptation Method for Large Language Models

Eman Alsuradi, Junhyun Lee, Kyenghun Lee, Hyeonmok Ko, Fahed Jubair


Abstract
Although Low-Rank Adaptation (LoRA) revolutionized parameter-efficient fine-tuning, it often incurs an inference overhead due to the extra computation required by adapter layers. While most literature focuses on maximizing accuracy or minimizing parameter counts, this paper prioritizes single-request inference performance in the unmerged adapter setting, where adapters must remain decoupled from the base model at runtime. By analyzing LoRA adapters on GPUs, we identify segmented function calls as the primary source of this latency. To address this, we propose Grouped Adaptive Weight Sharing (GAWS), a novel adapter design based on structured Kronecker product decomposition. Experiments on T5-3B, GPT-2 Large, LLaMA3.2-3B, and RoBERTa-Large show that GAWS reduces latency to about 40% of the gap between the unmerged LoRA and the base model, while maintaining parameter efficiency and comparable accuracy. This positions GAWS as a Pareto-efficient solution for deploying adapted LLMs in latency-sensitive settings, balancing the high latency of compressed adapters with the accuracy of LoRA. The source code is available at:https://github.com/SamsungLabs/GAWS .
Anthology ID:
2026.findings-acl.1590
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31790–31806
Language:
URL:
https://aclanthology.org/2026.findings-acl.1590/
DOI:
Bibkey:
Cite (ACL):
Eman Alsuradi, Junhyun Lee, Kyenghun Lee, Hyeonmok Ko, and Fahed Jubair. 2026. Grouped Adaptive Weight Sharing (GAWS): An Inference-Efficient Adaptation Method for Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 31790–31806, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Grouped Adaptive Weight Sharing (GAWS): An Inference-Efficient Adaptation Method for Large Language Models (Alsuradi et al., Findings 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.findings-acl.1590.pdf
Checklist:
 2026.findings-acl.1590.checklist.pdf