Mapping Language to Programs using Multiple Reward Components with Inverse Reinforcement Learning

Sayan Ghosh, Shashank Srivastava


Abstract
Mapping natural language instructions to programs that computers can process is a fundamental challenge. Existing approaches focus on likelihood-based training or using reinforcement learning to fine-tune models based on a single reward. In this paper, we pose program generation from language as Inverse Reinforcement Learning. We introduce several interpretable reward components and jointly learn (1) a reward function that linearly combines them, and (2) a policy for program generation. Fine-tuning with our approach achieves significantly better performance than competitive methods using Reinforcement Learning (RL). On the VirtualHome framework, we get improvements of up to 9.0% on the Longest Common Subsequence metric and 14.7% on recall-based metrics over previous work on this framework (Puig et al., 2018). The approach is data-efficient, showing larger gains in performance in the low-data regime. Generated programs are also preferred by human evaluators over an RL-based approach, and rated higher on relevance, completeness, and human-likeness.
Anthology ID:
2021.findings-emnlp.125
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1449–1462
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.125
DOI:
10.18653/v1/2021.findings-emnlp.125
Bibkey:
Cite (ACL):
Sayan Ghosh and Shashank Srivastava. 2021. Mapping Language to Programs using Multiple Reward Components with Inverse Reinforcement Learning. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1449–1462, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Mapping Language to Programs using Multiple Reward Components with Inverse Reinforcement Learning (Ghosh & Srivastava, Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.125.pdf
Video:
 https://aclanthology.org/2021.findings-emnlp.125.mp4
Code
 sgdgp/virtualhome_irl