TALE: Token-Adaptive Low-Rank KVCache Approximation with Reconstruction Elimination

Jaeseong Lee; Seung-won Hwang; Aurick Qiao; Daniel Campos; Zhewei Yao; Yuxiong He

doi:10.1162/tacl.a.39

TALE: Token-Adaptive Low-Rank KVCache Approximation with Reconstruction Elimination

Jaeseong Lee, Seung-won Hwang, Aurick Qiao, Daniel Campos, Zhewei Yao, Yuxiong He

Abstract

KVCache, by storing key-value pairs for reuse, has been crucial for enhancing inference efficiency for large language models (LLMs). However, the increasing memory demands of KVCache, especially with recent trends of longer input sequences, present a major challenge. In this work, we propose an innovative token-adaptive low-rank approximation strategy for KVCache compression. By applying varying ranks based on token significance, our method compresses KVCache efficiently while retaining critical information. Moreover, we introduce a lazy approximation technique, which approximates lazily only when needed, alongside a reconstruction-free design to bypass costly recalculations. Combined with multi-level quantization, this method reduces KVCache size by 9.1× on the Llama-3.1-8B model, with minimal performance degradation on complex tasks such as GSM8K. Moreover, our custom attention implementation shows up to 2× latency reduction compared to the conventional method in long context scenarios. The code is publicly available.

Anthology ID:: 2025.tacl-1.59
Volume:: Transactions of the Association for Computational Linguistics, Volume 13
Month:
Year:: 2025
Address:: Cambridge, MA
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 1298–1318
Language:
URL:: https://aclanthology.org/2025.tacl-1.59/
DOI:: 10.1162/tacl.a.39
Bibkey:
Cite (ACL):: Jaeseong Lee, Seung-won Hwang, Aurick Qiao, Daniel Campos, Zhewei Yao, and Yuxiong He. 2025. TALE: Token-Adaptive Low-Rank KVCache Approximation with Reconstruction Elimination. Transactions of the Association for Computational Linguistics, 13:1298–1318.
Cite (Informal):: TALE: Token-Adaptive Low-Rank KVCache Approximation with Reconstruction Elimination (Lee et al., TACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.tacl-1.59.pdf

PDF Cite Search Fix data