SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

Yiqiao Jin; Kartik Sharma; Vineeth Rakesh; Yingtong Dou; Menghai Pan; Mahashweta Das; Srijan Kumar

SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression

Yiqiao Jin, Kartik Sharma, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar

Abstract

Retrieval-augmented generation (RAG) extends large language models (LLMs) with external knowledge, but it must balance limited effective context, redundant retrieved evidence, and the loss of fine-grained facts under aggressive compression. Pure compression-based approaches reduce input size but often discard fine-grained details essential for factual accuracy. We propose SARA, a hybrid RAG framework that targets answer quality under fixed token budgets by combining natural-language snippets with semantic compression vectors. SARA retains a small set of passages in text form to preserve entities and numerical values, compresses the remaining evidence into interpretable vectors for broader coverage, and uses those vectors for iterative evidence reranking. Across 9 datasets and 5 open-source LLMs spanning 3 model families (Mistral, Llama, and Gemma), SARA consistently improves answer relevance (+17.71), answer correctness (+13.72), and semantic similarity (+15.53), demonstrating the importance of integrating textual and compressed representations for robust, context-efficient RAG.

Anthology ID:: 2026.acl-long.661
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14508–14528
Language:
URL:: https://aclanthology.org/2026.acl-long.661/
DOI:
Bibkey:
Cite (ACL):: Yiqiao Jin, Kartik Sharma, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, and Srijan Kumar. 2026. SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14508–14528, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression (Jin et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.661.pdf
Checklist:: 2026.acl-long.661.checklist.pdf

PDF Cite Search Checklist Fix data