DORB: Dynamically Optimizing Multiple Rewards with Bandits

DORB: Dynamically Optimizing Multiple Rewards with Bandits Ramakanth Pasunuru author Han Guo author Mohit Bansal author 2020-11 text Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) Bonnie Webber editor Trevor Cohn editor Yulan He editor Yang Liu editor Association for Computational Linguistics Online conference publication pasunuru-etal-2020-dorb 10.18653/v1/2020.emnlp-main.625 https://aclanthology.org/2020.emnlp-main.625/ 2020-11 7766 7780