Anchoring the Cache: Mitigating Contextual Hallucination in KV-Compressed Long-Context Summarization

Yu Fu; Chen Luo; Josef Valvoda; Xin Zhang; Xuejing Lei; Xiao Pan; Hui Liu; Yue Dong

Anchoring the Cache: Mitigating Contextual Hallucination in KV-Compressed Long-Context Summarization

Yu Fu, Chen Luo, Josef Valvoda, Xin Zhang, Xuejing Lei, Xiao Pan, Hui Liu, Yue Dong

Abstract

Key-Value (KV) cache compression techniques have improved the efficiency of long-context summarization in Large Language Models (LLMs), but their impact on model hallucination remains underexplored. In this paper, we present the first systematic study of how KV cache compression affects hallucination in long-context summarization, demonstrating that aggressive compression can increase hallucination scores by up to 3.36× compared to the baseline. To mitigate this issue, we propose HalluKV, a decoding-phase strategy that selectively removes generated KV pairs from retrieval heads responsible for retrieving critical information from source context, thereby anchoring their attention on the preserved source information. Our approach maintains computational efficiency while significantly reducing hallucination across multiple models and datasets, achieving up to 5.48 average point reductions on Llama-3-8B-Instruct, enabling more trustworthy long-context summarization.

Anthology ID:: 2026.acl-long.1542
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 33397–33413
Language:
URL:: https://aclanthology.org/2026.acl-long.1542/
DOI:
Bibkey:
Cite (ACL):: Yu Fu, Chen Luo, Josef Valvoda, Xin Zhang, Xuejing Lei, Xiao Pan, Hui Liu, and Yue Dong. 2026. Anchoring the Cache: Mitigating Contextual Hallucination in KV-Compressed Long-Context Summarization. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 33397–33413, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Anchoring the Cache: Mitigating Contextual Hallucination in KV-Compressed Long-Context Summarization (Fu et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1542.pdf
Checklist:: 2026.acl-long.1542.checklist.pdf

PDF Cite Search Checklist Fix data