Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback

Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback Khanh Nguyen author Hal Daumé III author Jordan Boyd-Graber author 2017-09 text Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing Martha Palmer editor Rebecca Hwa editor Sebastian Riedel editor Association for Computational Linguistics Copenhagen, Denmark conference publication nguyen-etal-2017-reinforcement 10.18653/v1/D17-1153 https://aclanthology.org/D17-1153/ 2017-09 1464 1474