pdf bibDialog policy optimization for low resource setting using Self-play and Reward based SamplingTharindu Madusanka | Durashi Langappuli | Thisara Welmilla | Uthayasanker Thayasivam | Sanath JayasenaProceedings of the 34th Pacific Asia Conference on Language, Information and Computation