Cultivating Gaming Sense for Yourself: Making VLMs Gaming Experts

Wenxuan Lu; Jiangyang He; Zhanqiu Zhang; Steven Y. Guo; Tianning Zang

doi:10.18653/v1/2025.acl-long.643

Cultivating Gaming Sense for Yourself: Making VLMs Gaming Experts

Wenxuan Lu, Jiangyang He, Zhanqiu Zhang, Steven Y. Guo, Tianning Zang

Abstract

Developing agents capable of fluid gameplay in first/third-person games without API access remains a critical challenge in Artificial General Intelligence (AGI). Recent efforts leverage Vision Language Models (VLMs) as direct controllers, frequently pausing the game to analyze screens and plan action through language reasoning. However, this inefficient paradigm fundamentally restricts agents to basic and non-fluent interactions: relying on isolated VLM reasoning for each action makes it impossible to handle tasks requiring high reactivity (e.g., FPS shooting) or dynamic adaptability (e.g., ACT combat). To handle this, we propose a paradigm shift in gameplay agent design: instead of direct control, VLM serves as a developer, creating specialized execution modules tailored for tasks like shooting and combat. These modules handle real-time game interactions, elevating VLM to a high-level developer. Building upon this paradigm, we introduce GameSense, a gameplay agent framework where VLM develops task-specific game sense modules by observing task execution and leveraging vision tools and neural network training pipelines. These modules encapsulate action-feedback logic, ranging from direct action rules to neural network-based decisions. Experiments demonstrate that our framework is the first to achieve fluent gameplay in diverse genres, including ACT, FPS, and Flappy Bird, setting a new benchmark for game-playing agents.

Anthology ID:: 2025.acl-long.643
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13132–13152
Language:
URL:: https://aclanthology.org/2025.acl-long.643/
DOI:: 10.18653/v1/2025.acl-long.643
Bibkey:
Cite (ACL):: Wenxuan Lu, Jiangyang He, Zhanqiu Zhang, Steven Y. Guo, and Tianning Zang. 2025. Cultivating Gaming Sense for Yourself: Making VLMs Gaming Experts. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13132–13152, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Cultivating Gaming Sense for Yourself: Making VLMs Gaming Experts (Lu et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.643.pdf

PDF Cite Search Fix data