Zhewen Tan
2026
ARC: Active and Reflection-driven Context Management for Long-Horizon Information Seeking Agents
Yilun Yao | Shan Huang | Elsie Dai | Zhewen Tan | Zhenyu Duan | Shousheng Jia | Yanbing Jiang | Tong Yang
Findings of the Association for Computational Linguistics: ACL 2026
Yilun Yao | Shan Huang | Elsie Dai | Zhewen Tan | Zhenyu Duan | Shousheng Jia | Yanbing Jiang | Tong Yang
Findings of the Association for Computational Linguistics: ACL 2026
Large language models are increasingly deployed as research agents for deep search and long-horizon information seeking, yet their performance often degrades as interaction histories grow. This degradation, known as context rot, reflects a failure to maintain coherent and task-relevant internal states over extended reasoning horizons. Existing approaches primarily manage context through raw accumulation or passive summarization, treating it as a static artifact and allowing early errors or misplaced emphasis to persist. Motivated by this perspective, we propose ARC, which is the first framework to systematically formulate context management as an active, reflection-driven process that treats context as a dynamic internal reasoning state during execution. ARC operationalizes this view through reflection-driven monitoring and revision, allowing agents to actively reorganize their working context when misalignment or degradation is detected. Experiments on challenging long-horizon information-seeking benchmarks show that ARC consistently outperforms passive context compression methods, achieving up to an 11% absolute improvement in accuracy on BrowseComp-ZH with Qwen2.5-32B-Instruct.
TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment
Zhewen Tan | Wenhan Yu | Jianfeng Si | Tongxin Liu | Kaiqi Guan | Huiyan Jin | Jiawen Tao | Xiaokun Yuan | Xiangzheng Zhang | Duohe Ma | Tong Yang | Lin Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhewen Tan | Wenhan Yu | Jianfeng Si | Tongxin Liu | Kaiqi Guan | Huiyan Jin | Jiawen Tao | Xiaokun Yuan | Xiangzheng Zhang | Duohe Ma | Tong Yang | Lin Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In recent years, safety risks associated with large language models have become increasingly prominent, highlighting the urgent need to mitigate the generation of toxic and harmful content. The mainstream paradigm for LLM safety alignment typically adopts a collaborative framework involving three roles: an attacker for adversarial prompt generation, a defender for safety defense, and an evaluator for response assessment. In this paper, we propose a closed-loop reinforcement learning framework called TriPlay-RL that enables iterative and co-improving collaboration among three roles with near-zero manual annotation. Experimental results show that the attacker preserves high output diversity while achieving a 20%–50% improvement in adversarial effectiveness. The defender attains 10%–30% gains in safety performance without degrading general reasoning capability, and the evaluator continuously refines its fine-grained judgment ability through iterations, accurately distinguishing unsafe responses, simple refusals, and useful guidance. Overall, our framework establishes an efficient and scalable paradigm for LLM safety alignment, enabling continuous co-evolution within a unified learning loop. The code is available at https://github.com/Qihoo360/TriPlay-RL.