Multi-matrix Factorization Attention

Jingcheng Hu; Houyi Li; Yinmin Zhang; Zili Wang; Shuigeng Zhou; Xiangyu Zhang; Heung Yeung Shum

doi:10.18653/v1/2025.findings-acl.1288

Multi-matrix Factorization Attention

Jingcheng Hu, Houyi Li, Yinmin Zhang, Zili Wang, Shuigeng Zhou, Xiangyu Zhang, Heung-Yeung Shum

Abstract

We propose novel attention architectures, Multi-matrix Factorization Attention (MFA) and MFA-Key-Reuse (MFA-KR). Existing variants for standard Multi-Head Attention (MHA), including SOTA methods like MLA, fail to maintain as strong performance under stringent Key-Value cache (KV cache) constraints. MFA enhances model capacity by efficiently scaling up both the number and dimension of attention heads through low-rank matrix factorization in the Query-Key (QK) circuit. Extending MFA, MFA-KR further reduces memory requirements by repurposing the key cache as value through value projection re-parameterization. MFA’s design enables strong model capacity when working under tight KV cache budget, while MFA-KR is suitable for even harsher KV cache limits with minor performance trade-off. Notably, in our extensive and large-scale experiments, the proposed architecture outperforms MLA and performs comparably to MHA, while reducing KV cache usage by up to 56% and 93.7%, respectively.

Anthology ID:: 2025.findings-acl.1288
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25114–25126
Language:
URL:: https://aclanthology.org/2025.findings-acl.1288/
DOI:: 10.18653/v1/2025.findings-acl.1288
Bibkey:
Cite (ACL):: Jingcheng Hu, Houyi Li, Yinmin Zhang, Zili Wang, Shuigeng Zhou, Xiangyu Zhang, and Heung-Yeung Shum. 2025. Multi-matrix Factorization Attention. In Findings of the Association for Computational Linguistics: ACL 2025, pages 25114–25126, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Multi-matrix Factorization Attention (Hu et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-acl.1288.pdf

PDF Cite Search Fix data