Jiangming Shu

Also published as: 江明


2026

Long-context Large Language Models, despite their expanded capacity, require careful working memory management to mitigate attention dilution during long-horizon tasks. Yet existing approaches rely on external mechanisms that lack awareness of the agent’s reasoning state, leading to suboptimal decisions. We propose Memory-as-Action (MemAct), a framework that treats working memory management as learnable policy actions. By formulating context management as in-place editing operations (deletion, insertion), MemAct enables joint optimization of information retention and task performance through end-to-end reinforcement learning. To address the computational challenges of dynamic context updates, we introduce Dynamic Context Policy Optimization, which restores training efficiency without compromising reasoning integrity. Experiments show that MemAct-RL-14B matches the accuracy of models 16× larger while reducing average context length by 51%, with learned strategies that adapt to model capabilities and generalize across task complexities. The code and datasets are available at https://github.com/ADaM-BJTU/MemAct.

2024

“自动报告生成技术在提高工作效率和节约人力资源方面具有显著潜力。大语言模型的出现使得报告流畅度与可解释性得到提升。然而,现有工作仍依赖人工,缺乏灵活性和丰富度。同时,小模型错误或冗余的输出与大模型自身的随机性会导致报告质量不稳定。本文提出大小模型协同的自动报告生成框架AutoRG,通过大模型的工具理解与规划能力减少人工干预,提升报告丰富度,并通过信息修正与报告迭代机制提高报告的稳定性。本文以自动专利报告生成为场景,从多个维度对AutoRG进行全面测试。结果表明,该框架在提高报告生成的丰富度和质量稳定性方面具有显著优势。”