MemCoRL: Alternating Co-Optimization of Memory Retrieval and Utilization via Collaborative Reinforcement Learning

Yuewen Liu; Peng Xu; Muxi Diao; Anyi Zhang; Yang Li; Yutong Zhang

MemCoRL: Alternating Co-Optimization of Memory Retrieval and Utilization via Collaborative Reinforcement Learning

Yuewen Liu, Peng Xu, Muxi Diao, Anyi Zhang, Yang Li, Yutong Zhang

Abstract

Large Language Models (LLMs) are inherently constrained by their fixed-length context windows, which limits LLMs’ ability to retain and utilize information across long-term interactions. To address this limitation, recent work has proposed external memory modules for LLMs. Using memory modules typically involves two stages: evidence retrieval and memory utilization. While prior work focuses on the architecture of memory modules and the retrieval stage, the equally critical memory utilization stage remains underexplored. Building on this, we propose MemCoRL, a two-stage alternating co-optimization reinforcement learning method. Stage 1 optimizes evidence retrieval using citation feedback and semantic accuracy from utilization as rewards. Stage 2 optimizes utilization with rewards combining semantic similarity and lexical overlap. Iterative co-optimization establishes a positive feedback loop: better retrieval improves memory utilization, which in turn refines retrieval rewards. Experimental results show our approach outperforms the leading baselines on both lexical overlap and semantic similarity metrics, confirming the co-optimization in memory retrieval and memory utilization.

Anthology ID:: 2026.acl-long.1804
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 38912–38924
Language:
URL:: https://aclanthology.org/2026.acl-long.1804/
DOI:
Bibkey:
Cite (ACL):: Yuewen Liu, Peng Xu, Muxi Diao, Anyi Zhang, Yang Li, and Yutong Zhang. 2026. MemCoRL: Alternating Co-Optimization of Memory Retrieval and Utilization via Collaborative Reinforcement Learning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 38912–38924, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MemCoRL: Alternating Co-Optimization of Memory Retrieval and Utilization via Collaborative Reinforcement Learning (Liu et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1804.pdf
Checklist:: 2026.acl-long.1804.checklist.pdf

PDF Cite Search Checklist Fix data