Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator

Qinyuan Cheng, Linyang Li, Guofeng Quan, Feng Gao, Xiaofeng Mou, Xipeng Qiu


Abstract
Task-Oriented Dialogue (TOD) systems are drawing more and more attention in recent studies. Current methods focus on constructing pre-trained models or fine-tuning strategies while the evaluation of TOD is limited by a policy mismatch problem. That is, during evaluation, the user utterances are from the annotated dataset while these utterances should interact with previous responses which can have many alternatives besides annotated texts. Therefore, in this work, we propose an interactive evaluation framework for TOD. We first build a goal-oriented user simulator based on pre-trained models and then use the user simulator to interact with the dialogue system to generate dialogues. Besides, we introduce a sentence-level and a session-level score to measure the sentence fluency and session coherence in the interactive evaluation. Experimental results show that RL-based TOD systems trained by our proposed user simulator can achieve nearly 98% inform and success rates in the interactive evaluation of MultiWOZ dataset and the proposed scores measure the response quality besides the inform and success rates. We are hoping that our work will encourage simulator-based interactive evaluations in the TOD task.
Anthology ID:
2022.findings-emnlp.90
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1248–1259
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.90
DOI:
10.18653/v1/2022.findings-emnlp.90
Bibkey:
Cite (ACL):
Qinyuan Cheng, Linyang Li, Guofeng Quan, Feng Gao, Xiaofeng Mou, and Xipeng Qiu. 2022. Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1248–1259, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator (Cheng et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-emnlp.90.pdf
Video:
 https://aclanthology.org/2022.findings-emnlp.90.mp4