Learning an Unreferenced Metric for Online Dialogue Evaluation

Koustuv Sinha, Prasanna Parthasarathi, Jasmine Wang, Ryan Lowe, William L. Hamilton, Joelle Pineau


Abstract
Evaluating the quality of a dialogue interaction between two agents is a difficult task, especially in open-domain chit-chat style dialogue. There have been recent efforts to develop automatic dialogue evaluation metrics, but most of them do not generalize to unseen datasets and/or need a human-generated reference response during inference, making it infeasible for online evaluation. Here, we propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances, and leverages the temporal transitions that exist between them. We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
Anthology ID:
2020.acl-main.220
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2430–2441
Language:
URL:
https://aclanthology.org/2020.acl-main.220
DOI:
10.18653/v1/2020.acl-main.220
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.220.pdf
Video:
 http://slideslive.com/38928843
Code
 facebookresearch/online_dialog_eval
Data
PERSONA-CHAT