Understanding Game-Playing Agents with Natural Language Annotations

Nicholas Tomlin, Andre He, Dan Klein


Abstract
We present a new dataset containing 10K human-annotated games of Go and show how these natural language annotations can be used as a tool for model interpretability. Given a board state and its associated comment, our approach uses linear probing to predict mentions of domain-specific terms (e.g., ko, atari) from the intermediate state representations of game-playing agents like AlphaGo Zero. We find these game concepts are nontrivially encoded in two distinct policy networks, one trained via imitation learning and another trained via reinforcement learning. Furthermore, mentions of domain-specific terms are most easily predicted from the later layers of both models, suggesting that these policy networks encode high-level abstractions similar to those used in the natural language annotations.
Anthology ID:
2022.acl-short.90
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
797–807
Language:
URL:
https://aclanthology.org/2022.acl-short.90
DOI:
10.18653/v1/2022.acl-short.90
Bibkey:
Cite (ACL):
Nicholas Tomlin, Andre He, and Dan Klein. 2022. Understanding Game-Playing Agents with Natural Language Annotations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 797–807, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Understanding Game-Playing Agents with Natural Language Annotations (Tomlin et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-short.90.pdf
Software:
 2022.acl-short.90.software.zip
Video:
 https://aclanthology.org/2022.acl-short.90.mp4
Code
 andrehe02/go-probe