Learning When Not to Answer: a Ternary Reward Structure for Reinforcement Learning Based Question Answering

Fréderic Godin; Anjishnu Kumar; Arpit Mittal

doi:10.18653/v1/N19-2016

Learning When Not to Answer: a Ternary Reward Structure for Reinforcement Learning Based Question Answering

Fréderic Godin, Anjishnu Kumar, Arpit Mittal

Abstract

In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of previously correctly answered questions. Employing a supervised learning strategy using depth-first-search paths to bootstrap the reinforcement learning algorithm further improves performance.

Anthology ID:: N19-2016
Volume:: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Anastassia Loukina, Michelle Morales, Rohit Kumar
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 122–129
Language:
URL:: https://aclanthology.org/N19-2016/
DOI:: 10.18653/v1/N19-2016
Bibkey:
Cite (ACL):: Fréderic Godin, Anjishnu Kumar, and Arpit Mittal. 2019. Learning When Not to Answer: a Ternary Reward Structure for Reinforcement Learning Based Question Answering. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers), pages 122–129, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: Learning When Not to Answer: a Ternary Reward Structure for Reinforcement Learning Based Question Answering (Godin et al., NAACL 2019)
Copy Citation:
PDF:: https://aclanthology.org/N19-2016.pdf
Data: FB15k-237

PDF Cite Search Fix data