KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning

Xiao Yu; Qingyang Wu; Kun Qian; Zhou Yu

doi:10.18653/v1/2023.emnlp-main.759

KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning

Abstract

In task-oriented dialogs (TOD), reinforcement learning (RL) algorithms train a model to directly optimize response for task-related metrics. However, RL often needs to perform exploration, which can be time-consuming due to the slow auto-regressive sequence generation process. We investigate an approach to create a more efficient RL-based algorithm to improve TOD performance in an offline setting. First, we use a faster generation procedure that samples from independent next-word distributions after training the language model (LM) with supervised learning. We then introduce a fine-grained reward function to help the model focus on learning key information in a dialog, by measuring the importance and semantic closeness of each generated token. Experiments on the MultiWoZ dataset show our new training algorithm, Keywords Reinforcement Learning with Next-word Sampling (KRLS), achieves state-of-the-art performance on the end-to-end response generation task, with a 15% training time reduction compared to a standard RL algorithm using auto-regressive generation.

Anthology ID:: 2023.emnlp-main.759
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12338–12358
Language:
URL:: https://aclanthology.org/2023.emnlp-main.759/
DOI:: 10.18653/v1/2023.emnlp-main.759
Bibkey:
Cite (ACL):: Xiao Yu, Qingyang Wu, Kun Qian, and Zhou Yu. 2023. KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12338–12358, Singapore. Association for Computational Linguistics.
Cite (Informal):: KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning (Yu et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.759.pdf
Video:: https://aclanthology.org/2023.emnlp-main.759.mp4

PDF Cite Search Video Fix data