Generative Gamer: Learning Equilibrium Strategy by LLM-driven Dynamic Deduction

Yadong Zhang; Xinshu Shen; Yupei Ren; Shangqing Zhao; Man Lan

Generative Gamer: Learning Equilibrium Strategy by LLM-driven Dynamic Deduction

Yadong Zhang, Xinshu Shen, Yupei Ren, Shangqing Zhao, Man Lan

Abstract

Large Language Models (LLMs) have demonstrated remarkable general capabilities, yet they falter in domains requiring deep strategic reasoning. A primary obstacle is the need to navigate a game tree that grows exponentially with search depth, a task for which their generative nature is ill-suited. To address this, we introduce Generative Gamer (GenGamer), a framework that trains LLMs to reason like an expert player. Instead of attempting an exhaustive search, GenGamer learns to generate a compact, pruned reasoning trajectory termed as a Dynamic Deduction. This is achieved by integrating three key strategies: action pruning based on policy confidence, state pruning via value estimation, and branch pruning inspired by alpha-beta principles. Furthermore, to train the model effectively, we propose the Deduction Tree Reward (DTR), a process-oriented mechanism that provides step-by-step feedback on the quality of the reasoning process, rather than relying solely on the final game outcome. Experiments on complex games such as Tic-Tac-Toe and Leduc Poker demonstrate that GenGamer significantly enhances the strategic capabilities of LLMs, enabling them to achieve performance that surpasses current state-of-the-art language models.

Anthology ID:: 2026.acl-long.574
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12604–12617
Language:
URL:: https://aclanthology.org/2026.acl-long.574/
DOI:
Bibkey:
Cite (ACL):: Yadong Zhang, Xinshu Shen, Yupei Ren, Shangqing Zhao, and Man Lan. 2026. Generative Gamer: Learning Equilibrium Strategy by LLM-driven Dynamic Deduction. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12604–12617, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Generative Gamer: Learning Equilibrium Strategy by LLM-driven Dynamic Deduction (Zhang et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.574.pdf
Checklist:: 2026.acl-long.574.checklist.pdf

PDF Cite Search Checklist Fix data