Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation

Julia Belikova; Danila Rozhevskii; Dennis Svirin; Konstantin Polev; Alexander Panchenko

Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation

Julia Belikova, Danila Rozhevskii, Dennis Svirin, Konstantin Polev, Alexander Panchenko

Abstract

Efficient long-context processing remains a crucial challenge for contemporary large language models (LLMs), especially in resource-constrained environments. Soft compression architectures promise to extend effective context length by replacing long token sequences with smaller sets of learned compressed tokens. Yet, the limits of compressibility – and when compression begins to erase task-relevant content – remain underexplored. In this paper, we define token overflow as a regime in which compressed representations no longer contain sufficient information to answer a given query, and propose a methodology to characterize and detect it. In the xRAG soft-compression setting, we find that query-agnostic saturation statistics reliably separate compressed from uncompressed token representations, providing a practical tool for identifying compressed tokens but showing limited overflow detection capability. Lightweight probing classifiers over both query and context xRAG representations detect overflow with 0.72 AUC-ROC on average on HotpotQA, SQuADv2, and TriviaQA datasets, demonstrating that incorporating query information improves detection performance. These results advance from query-independent diagnostics to query-aware detectors, enabling low-cost pre-LLM gating to mitigate compression-induced errors.

Anthology ID:: 2026.eacl-srw.59
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Selene Baez Santamaria, Sai Ashish Somayajula, Atsuki Yamaguchi
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 797–810
Language:
URL:: https://aclanthology.org/2026.eacl-srw.59/
DOI:
Bibkey:
Cite (ACL):: Julia Belikova, Danila Rozhevskii, Dennis Svirin, Konstantin Polev, and Alexander Panchenko. 2026. Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 797–810, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation (Belikova et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-srw.59.pdf

PDF Cite Search Fix data