In-Context Former: Lightning-fast Compressing Context for Large Language Model

Xiangfeng Wang, Zaiyi Chen, Tong Xu, Zheyong Xie, Yongyi He, Enhong Chen


Abstract
With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach to mitigate these costs is compressing the long input contexts. Existing methods typically leverage the self-attention mechanism of the large model itself for context compression. While these methods have achieved notable results, the compression process still entails quadratic complexity. To mitigate this limitation, we propose the In-Context Former (IC-Former). This method does not rely on the target large model but instead utilizes cross-attention mechanisms to extract and condense information from the contextual embeddings. The computational overhead of our method grows linearly with the compression range. Experimental results indicate that our method requires only 1/32 of the floating-point operations of the baseline during compression and improves processing speed by 68 to 112 times while achieving 90% of the baseline performance on evaluation metrics. Additionally, IC-Former demonstrates strong regularity in its interactions with the context, enhancing its interpretability. Overall, IC-Former significantly reduces compression costs, making real-time compression scenarios feasible.
Anthology ID:
2024.findings-emnlp.138
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2445–2460
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.138
DOI:
Bibkey:
Cite (ACL):
Xiangfeng Wang, Zaiyi Chen, Tong Xu, Zheyong Xie, Yongyi He, and Enhong Chen. 2024. In-Context Former: Lightning-fast Compressing Context for Large Language Model. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 2445–2460, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
In-Context Former: Lightning-fast Compressing Context for Large Language Model (Wang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.138.pdf