Beyond the Context Window: Scaling Agentic RL via End-to-end Optimized Context Compression

Miao Lu; Weiwei Sun; Weihua Du; Zhan Ling; Xuesong Yao; Kang Liu; Jiecao Chen

Beyond the Context Window: Scaling Agentic RL via End-to-end Optimized Context Compression

Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, Jiecao Chen

Abstract

We study reinforcement learning (RL) fine-tuning of large language model (LLM) agents for long-horizon multi-turn tool use, where context length quickly becomes a fundamental bottleneck. Existing multi-turn RL pipelines suffer from degraded instruction following, excessive rollout costs, and most importantly, strict context limits. In this work, to address these challenges, we introduce summarization-based context management to training. In specific, it periodically compresses the tool using history by LLM-generated summaries that retain task-relevant information to keep a compact context while enabling the agent to scale beyond the fixed context window. Building on this formulation, we derive a policy gradient representation that seamlessly enables standard LLM RL infrastructures to optimize both tool-use behaviors as well as summarization strategies in an end-to-end fashion. We instantiate this framework with SUmmarization augmented Policy Optimization (SUPO), an LLM RL algorithm that enables long-horizon training beyond a fixed context limit. Experiments on interactive function calling and searching tasks demonstrate that SUPO significantly improves the success rate while maintaining the same or even lower working context length compared to baselines. We also demonstrate that for complex searching tasks SUPO can further improve the evaluation performance when scaling test-time maximum round of summarization beyond that of training time.

Anthology ID:: 2026.acl-long.966
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21074–21125
Language:
URL:: https://aclanthology.org/2026.acl-long.966/
DOI:
Bibkey:
Cite (ACL):: Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, and Jiecao Chen. 2026. Beyond the Context Window: Scaling Agentic RL via End-to-end Optimized Context Compression. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 21074–21125, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Beyond the Context Window: Scaling Agentic RL via End-to-end Optimized Context Compression (Lu et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.966.pdf
Checklist:: 2026.acl-long.966.checklist.pdf

PDF Cite Search Checklist Fix data