E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Zihan Liao; Jun Wang; Hang Yu; Lingxiao Wei; Jianguo Li; Jun Wang; Wei Zhang

doi:10.18653/v1/2025.emnlp-main.970

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

Zihan Liao, Jun Wang, Hang Yu, Lingxiao Wei, Jianguo Li, Jun Wang, Wei Zhang

Abstract

Processing long contexts is increasingly important for Large Language Models (LLMs) in tasks like multi-turn dialogues, code generation, and document summarization. This paper addresses the challenges of achieving high long-context performance, low computational complexity, and compatibility with pretrained models – collectively termed the “impossible triangle”. We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. E2LLM divides long contexts into chunks, compresses each into soft prompts using a pretrained text encoder, and aligns these representations with a decoder-only LLM via an adapter. To enhance the LLM’s reasoning with these soft prompts, we employ two training objectives: encoder output reconstruction and long-context instruction fine-tuning. Extensive experiments reveal that E2LLM not only outperforms 8 state-of-the-art (SOTA) methods in effectiveness and efficiency for document summarization and question answering, but also achieves the best performance on LongBench v2 among models of comparable size. The source code is available at https://github.com/codefuse-ai/E2LLM.

Anthology ID:: 2025.emnlp-main.970
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19201–19230
Language:
URL:: https://aclanthology.org/2025.emnlp-main.970/
DOI:: 10.18653/v1/2025.emnlp-main.970
Bibkey:
Cite (ACL):: Zihan Liao, Jun Wang, Hang Yu, Lingxiao Wei, Jianguo Li, Jun Wang, and Wei Zhang. 2025. E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 19201–19230, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning (Liao et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.970.pdf
Checklist:: 2025.emnlp-main.970.checklist.pdf

PDF Cite Search Checklist Fix data