EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices

Jiyu Chen; Shuang Peng; Daxiong Luo; Fan Yang; Renshou Wu; Fangyuan Li; Xiaoxin Chen

doi:10.18653/v1/2025.acl-industry.40

EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices

Jiyu Chen, Shuang Peng, Daxiong Luo, Fan Yang, Renshou Wu, Fangyuan Li, Xiaoxin Chen

Abstract

Transformer-based large language models (LLMs) encounter challenges in processing long sequences on edge devices due to the quadratic complexity of attention mechanisms and growing memory demands from Key-Value (KV) cache. Existing KV cache optimizations struggle with irreversible token eviction in long-output tasks, while alternative sequence modeling architectures prove costly to adopt within established Transformer infrastructure. We present EdgeInfinite, a memory-efficient solution for infinite contexts that integrates compressed memory into Transformer-based LLMs through a trainable memory-gating module. This approach maintains full compatibility with standard Transformer architectures, requiring fine-tuning only a small part of parameters, and enables selective activation of the memory-gating module for long and short context task routing. The experimental result shows that EdgeInfinite achieves comparable performance to baseline Transformer-based LLM on long context benchmarks while optimizing memory consumption and time to first token.

Anthology ID:: 2025.acl-industry.40
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Georg Rehm, Yunyao Li
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 568–575
Language:
URL:: https://aclanthology.org/2025.acl-industry.40/
DOI:: 10.18653/v1/2025.acl-industry.40
Bibkey:
Cite (ACL):: Jiyu Chen, Shuang Peng, Daxiong Luo, Fan Yang, Renshou Wu, Fangyuan Li, and Xiaoxin Chen. 2025. EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 568–575, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices (Chen et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-industry.40.pdf

PDF Cite Search Fix data