Offline-to-Online Co-Evolutional User Simulator and Dialogue System

Dafeng Chi, Yuzheng Zhuang, Yao Mu, Bin Wang, Jianzhu Bao, Yasheng Wang, Yuhan Dong, Xin Jiang, Qun Liu, Jianye Hao


Abstract
Reinforcement learning (RL) has emerged as a promising approach to fine-tune offline pretrained GPT-2 model in task-oriented dialogue (TOD) systems. In order to obtain human-like online interactions while extending the usage of RL, building pretrained user simulators (US) along with dialogue systems (DS) and facilitating jointly fine-tuning via RL becomes prevalent. However, joint training brings distributional shift problem caused by compounding exposure bias. Existing methods usually iterative update US and DS to ameliorate the ensued non-stationarity problem, which could lead to sub-optimal policy and less sample efficiency. To take a step further for tackling the problem, we introduce an Offline-to-oNline Co-Evolutional (ONCE) framework, which enables bias-aware concurrent joint update for RL-based fine-tuning whilst takes advantages from GPT-2 based end-to-end modeling on US and DS. Extensive experiments demonstrate that ONCE builds high-quality loops of policy learning and dialogues data collection, and achieves state-of-the-art online and offline evaluation results on MultiWOZ2.1 dataset. Opensourced code will be implemented with Mindspore (MS, 2022) and released on our homepage.
Anthology ID:
2022.seretod-1.11
Volume:
Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD)
Month:
December
Year:
2022
Address:
Abu Dhabi, Beijing (Hybrid)
Editors:
Zhijian Ou, Junlan Feng, Juanzi Li
Venue:
SereTOD
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
98–113
Language:
URL:
https://aclanthology.org/2022.seretod-1.11
DOI:
10.18653/v1/2022.seretod-1.11
Bibkey:
Cite (ACL):
Dafeng Chi, Yuzheng Zhuang, Yao Mu, Bin Wang, Jianzhu Bao, Yasheng Wang, Yuhan Dong, Xin Jiang, Qun Liu, and Jianye Hao. 2022. Offline-to-Online Co-Evolutional User Simulator and Dialogue System. In Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD), pages 98–113, Abu Dhabi, Beijing (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Offline-to-Online Co-Evolutional User Simulator and Dialogue System (Chi et al., SereTOD 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.seretod-1.11.pdf
Video:
 https://aclanthology.org/2022.seretod-1.11.mp4