Open-Domain Dialog Evaluation Using Follow-Ups Likelihood

Maxime De Bruyn, Ehsan Lotfi, Jeska Buhmann, Walter Daelemans


Abstract
Automatic evaluation of open-domain dialogs remains an unsolved problem. Existing methods do not correlate strongly with human annotations. In this paper, we present a new automated evaluation method based on the use of follow-ups. We measure the probability that a language model will continue the conversation with a fixed set of follow-ups (e.g. not really relevant here, what are you trying to say?). When compared against twelve existing methods, our new evaluation achieves the highest correlation with human evaluations.
Anthology ID:
2022.coling-1.40
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
496–504
Language:
URL:
https://aclanthology.org/2022.coling-1.40
DOI:
Bibkey:
Cite (ACL):
Maxime De Bruyn, Ehsan Lotfi, Jeska Buhmann, and Walter Daelemans. 2022. Open-Domain Dialog Evaluation Using Follow-Ups Likelihood. In Proceedings of the 29th International Conference on Computational Linguistics, pages 496–504, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Open-Domain Dialog Evaluation Using Follow-Ups Likelihood (De Bruyn et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.40.pdf
Code
 maximedb/full
Data
FED