Dialog policy optimization for low resource setting using Self-play and Reward based Sampling Tharindu Madusanka author Durashi Langappuli author Thisara Welmilla author Uthayasanker Thayasivam author Sanath Jayasena author 2020-10 text Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation Minh Le Nguyen editor Mai Chi Luong editor Sanghoun Song editor Association for Computational Linguistics Hanoi, Vietnam conference publication madusanka-etal-2020-dialog https://aclanthology.org/2020.paclic-1.21/ 2020-10 178 187