LLoCO: Learning Long Contexts Offline

Sijun Tan, Xiuyu Li, Shishir G Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph Gonzalez, Raluca Popa


Abstract
Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose LLoCO, a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning with LoRA. Our method enables an LLM to create a concise representation of the original context and efficiently retrieve relevant information to answer questions accurately. Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We evaluate our approach on several long-context question-answering datasets, demonstrating that LLoCO significantly outperforms in-context learning while using 30 × fewer tokens during inference. LLoCO achieves up to 7.62 × speed-up during inference and 11.52 × higher throughput during finetuning, substantially reduces the cost of long document question answering. This makes it a promising solution for efficient long context processing.
Anthology ID:
2024.emnlp-main.975
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17605–17621
Language:
URL:
https://aclanthology.org/2024.emnlp-main.975
DOI:
Bibkey:
Cite (ACL):
Sijun Tan, Xiuyu Li, Shishir G Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph Gonzalez, and Raluca Popa. 2024. LLoCO: Learning Long Contexts Offline. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17605–17621, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
LLoCO: Learning Long Contexts Offline (Tan et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.975.pdf