Semi-Supervised Policy Initialization for Playing Games with Language Hints

Tsu-Jui Fu; William Yang Wang

doi:10.18653/v1/2021.naacl-main.249

Semi-Supervised Policy Initialization for Playing Games with Language Hints

Abstract

Using natural language as a hint can supply an additional reward for playing sparse-reward games. Achieving a goal should involve several different hints, while the given hints are usually incomplete. Those unmentioned latent hints still rely on the sparse reward signal, and make the learning process difficult. In this paper, we propose semi-supervised initialization (SSI) that allows the agent to learn from various possible hints before training under different tasks. Experiments show that SSI not only helps to learn faster (1.2x) but also has a higher success rate (11% relative improvement) of the final policy.

Anthology ID:: 2021.naacl-main.249
Volume:: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:: June
Year:: 2021
Address:: Online
Editors:: Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3112–3116
Language:
URL:: https://aclanthology.org/2021.naacl-main.249/
DOI:: 10.18653/v1/2021.naacl-main.249
Bibkey:
Cite (ACL):: Tsu-Jui Fu and William Yang Wang. 2021. Semi-Supervised Policy Initialization for Playing Games with Language Hints. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3112–3116, Online. Association for Computational Linguistics.
Cite (Informal):: Semi-Supervised Policy Initialization for Playing Games with Language Hints (Fu & Wang, NAACL 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.naacl-main.249.pdf
Video:: https://aclanthology.org/2021.naacl-main.249.mp4

PDF Cite Search Video Fix data