Disentangle to Decay: Linear Attention with Trainable Decay Factor

Haibo Tong, Chenyang Zhang, Jiayi Lin, Bingxuan Hou, Qingqing Hong, Junli Wang


Abstract
Linear attention enhances inference efficiency of Transformer and has attracted research interests as an efficient backbone of language models. Existing linear attention based models usually exploit decay factor based positional encoding (PE), where attention scores decay exponentially with increasing relative distance. However, most work manually designs a non-trainable decay factor of exponential calculation, which limits further optimization. Our analysis reveals directly training decay factor is unstable because of large gradients. To address this, we propose a novel PE for linear attention named Disentangle to Decay (D2D). D2D disentangles decay factor into two parts to achieve further optimization and stable training. Moreover, D2D can be transformed into recurrent form for efficient inference. Experiments demonstrate that D2D achieves stable training of decay factor, and enhances performance of linear attention in both normal context length and length extrapolation scenarios.
Anthology ID:
2025.coling-main.660
Volume:
Proceedings of the 31st International Conference on Computational Linguistics
Month:
January
Year:
2025
Address:
Abu Dhabi, UAE
Editors:
Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9877–9890
Language:
URL:
https://aclanthology.org/2025.coling-main.660/
DOI:
Bibkey:
Cite (ACL):
Haibo Tong, Chenyang Zhang, Jiayi Lin, Bingxuan Hou, Qingqing Hong, and Junli Wang. 2025. Disentangle to Decay: Linear Attention with Trainable Decay Factor. In Proceedings of the 31st International Conference on Computational Linguistics, pages 9877–9890, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Disentangle to Decay: Linear Attention with Trainable Decay Factor (Tong et al., COLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.coling-main.660.pdf