ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training

Maryam Dialameh, Rezaul Karim, Hossein Rajabzadeh, Omar Mohamed Awad, Boxing Chen, Hyock Ju Kwon, Walid Ahmed, Yang Liu


Abstract
This paper introduces ECHO-LLaMA, an efficient LLaMA architecture designed to improve both the training speed and inference throughput of LLaMA architectures while maintaining its learning capacity. ECHO-LLaMA transforms LLaMA models into shared KV caching across certain layers, significantly reducing KV computational complexity while maintaining or improving language performance. Experimental results demonstrate that ECHO-LLaMA achieves up to 77% higher token-per-second throughput during training, up to 16% higher Model FLOPs Utilization (MFU), and up to 14% lower loss when trained on an equal number of tokens. Furthermore, on the 1.1B model, ECHO-LLaMA delivers approximately 7% higher test-time throughput compared to the baseline. By introducing a computationally efficient adaptation mechanism, ECHO-LLaMA offers a scalable and cost-effective solution for pretraining and finetuning large language models, enabling faster and more resource-efficient training without compromising performance.
Anthology ID:
2025.emnlp-industry.156
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2025
Address:
Suzhou (China)
Editors:
Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2252–2269
Language:
URL:
https://aclanthology.org/2025.emnlp-industry.156/
DOI:
Bibkey:
Cite (ACL):
Maryam Dialameh, Rezaul Karim, Hossein Rajabzadeh, Omar Mohamed Awad, Boxing Chen, Hyock Ju Kwon, Walid Ahmed, and Yang Liu. 2025. ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 2252–2269, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):
ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training (Dialameh et al., EMNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.emnlp-industry.156.pdf