Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking

Wuwei Zhang; Fangcong Yin; Howard Yen; Danqi Chen; Xi Ye

doi:10.18653/v1/2025.emnlp-main.1214

Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking

Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, Xi Ye

Abstract

Recent work has identified retrieval heads (Wu et al., 2025), a subset of attention heads responsible for retrieving salient information in long-context language models (LMs), as measured by their copy-paste behavior in Needle-in-a-Haystack tasks. In this paper, we introduce QRHead (Query-Focused Retrieval Head), an improved set of attention heads that enhance retrieval from long context. We identify QRHead by aggregating attention scores with respect to the input query, using a handful of examples from real-world tasks (e.g., long-context QA). We further introduce QRRetriever, an efficient and effective retriever that uses the accumulated attention mass of QRHead as retrieval scores. We use QRRetriever for long-context reasoning by selecting the most relevant parts with the highest retrieval scores. On multi-hop reasoning tasks LongMemEval and CLIPPER, this yields over 10% performance gains over full context and outperforms strong dense retrievers. We also evaluate QRRetriever as a re-ranker on the BEIR benchmark and find that it achieves strong zero-shot performance, outperforming other LLM-based re-rankers such as RankGPT. Further analysis shows that both the query-context attention scoring and task selection are crucial for identifying QRHead with strong downstream utility. Overall, our work contributes a general-purpose retriever and offers interpretability insights into the long-context capabilities of LMs.

Anthology ID:: 2025.emnlp-main.1214
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23791–23805
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1214/
DOI:: 10.18653/v1/2025.emnlp-main.1214
Bibkey:
Cite (ACL):: Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, and Xi Ye. 2025. Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 23791–23805, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking (Zhang et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1214.pdf
Checklist:: 2025.emnlp-main.1214.checklist.pdf

PDF Cite Search Checklist Fix data