DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation

ChaeHun Park, Seungil Lee, Daniel Rim, Jaegul Choo


Abstract
Despite the recent advances in open-domain dialogue systems, building a reliable evaluation metric is still a challenging problem. Recent studies proposed learnable metrics based on classification models trained to distinguish the correct response. However, neural classifiers are known to make overly confident predictions for examples from unseen distributions. We propose DENSITY, which evaluates a response by utilizing density estimation on the feature space derived from a neural classifier. Our metric measures how likely a response would appear in the distribution of human conversations. Moreover, to improve the performance of DENSITY, we utilize contrastive learning to further compress the feature space. Experiments on multiple response evaluation datasets show that DENSITY correlates better with human evaluations than the existing metrics.
Anthology ID:
2023.findings-acl.896
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14222–14236
Language:
URL:
https://aclanthology.org/2023.findings-acl.896
DOI:
10.18653/v1/2023.findings-acl.896
Bibkey:
Cite (ACL):
ChaeHun Park, Seungil Lee, Daniel Rim, and Jaegul Choo. 2023. DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 14222–14236, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation (Park et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.896.pdf
Video:
 https://aclanthology.org/2023.findings-acl.896.mp4