Zhixiao Qi
2025
Long-context Language Models Fail in Basic Retrieval Tasks Without Sufficient Reasoning Steps
Yijiong Yu
|
Zhixiao Qi
|
Yongfeng Huang
|
Wei Wang
|
Weifeng.liu
|
Ran Chen
|
Ji Pei
Findings of the Association for Computational Linguistics: EMNLP 2025
Long-context language models (LCLMs), characterized by their extensive context window, are becoming popular. However, despite the fact that they are nearly perfect at standard long-context retrieval tasks, our evaluations demonstrate they fail in some basic cases. Later, we find they can be well addressed with a sufficient number of reasoning steps, guided by specific CoT prompts. This result emphasizes the potential necessity of solving specific long-context tasks using long-CoT methods, while previous long-context benchmarks always ignore the necessity of long reasoning for long-context tasks and treat them as direct QA tasks. Our code and datasets are available at https://github.com/yuyijiong/hard_retrieval_for_llm
Search
Fix author
Co-authors
- Ran Chen 1
- Yongfeng Huang 1
- Weifeng Liu 1
- Ji Pei 1
- Wei Wang (王巍) 1
- show all...