MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers

Linrui Ma; Chun Hei Lo; Xinyu Wang; Peng Lu; Xihao Yuan; Hanting Chen; Kai Han; Xinghao Chen; Chengjun Zhan; Hanlin xu; Yichun Yin; Lifeng Shang; Feng Wen; Boxing Chen; Yufei Cui

MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers

Linrui Ma, Chun Hei Lo, Xinyu Wang, Peng Lu, Xihao Yuan, Hanting Chen, Kai Han, Xinghao Chen, Chengjun Zhan, Hanlin xu, Yichun Yin, Lifeng Shang, Feng Wen, Boxing Chen, Yufei Cui

Abstract

The quadratic computational cost of traditional attention mechanisms poses a major bottleneck to the scalability and practical deployment of large language models (LLMs), particularly in long-context scenarios. To improve efficiency, existing approaches often enforce rigid structural constraints such as local attention windows. However, these strategies typically lead to substantial performance degradation on tasks requiring precise long-range recall. In this work, we propose MATCH, a scalable and efficient framework that augments sparsified attention mechanisms with dynamically integrated in-context information through an efficient retrieval system. Empirical results show that MATCH significantly improves the performance of sparse-attention models on both synthetic and real-world natural-language tasks. These findings highlight the versatility of MATCH as a general approach for enhancing in-context retrieval capabilities while maintaining the efficiency benefits of sparse attention architectures.

Anthology ID:: 2026.acl-long.692
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15165–15179
Language:
URL:: https://aclanthology.org/2026.acl-long.692/
DOI:
Bibkey:
Cite (ACL):: Linrui Ma, Chun Hei Lo, Xinyu Wang, Peng Lu, Xihao Yuan, Hanting Chen, Kai Han, Xinghao Chen, Chengjun Zhan, Hanlin xu, Yichun Yin, Lifeng Shang, Feng Wen, Boxing Chen, and Yufei Cui. 2026. MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15165–15179, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers (Ma et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.692.pdf
Checklist:: 2026.acl-long.692.checklist.pdf

PDF Cite Search Checklist Fix data