MoKA:Parameter Efficiency Fine-Tuning via Mixture of Kronecker Product Adaption

Beiming Yu; Zhenfei Yang; Xiushuang Yi

MoKA:Parameter Efficiency Fine-Tuning via Mixture of Kronecker Product Adaption

Abstract

With the rapid development of large language models (LLMs), traditional full-parameter fine-tuning methods have become increasingly expensive in terms of computational resources and time costs. For this reason, parameter efficient fine-tuning (PEFT) methods have emerged. Among them, Low-Rank Adaptation (LoRA) is one of the current popular PEFT methods, which is widely used in large language models. However, the low-rank update mechanism of LoRA somewhat limits its ability to approximate full-parameter fine-tuning during the training process. In this paper, we propose a novel PEFT framework, MoKA (Mixture of Kronecker Product Adaptation), which combines the Kronecker product with the Mixture-of-Experts (MoE) method. By replacing the low-rank decomposition of the weight update matrix with Kronecker products and utilizing a sparse MoE architecture, MoKA achieves parameter efficiency and better model performance. Additionally, we design an efficient routing module to further compress the parameter size. We conduct extensive experiments on the GLUE benchmark, E2E NLG Challenge, and instruction tuning tasks for LLMs. The results demonstrate that MoKA outperforms existing PEFT methods.

Anthology ID:: 2025.coling-main.679
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10172–10182
Language:
URL:: https://aclanthology.org/2025.coling-main.679/
DOI:
Bibkey:
Cite (ACL):: Beiming Yu, Zhenfei Yang, and Xiushuang Yi. 2025. MoKA:Parameter Efficiency Fine-Tuning via Mixture of Kronecker Product Adaption. In Proceedings of the 31st International Conference on Computational Linguistics, pages 10172–10182, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: MoKA:Parameter Efficiency Fine-Tuning via Mixture of Kronecker Product Adaption (Yu et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.679.pdf

PDF Cite Search Fix data