ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments

Ruolan Yang, Zitong Li, Haifeng Tang, Kenny Zhu


Abstract
Existing automatic evaluation systems of chatbots mostly rely on static chat scripts as ground truth, which is hard to obtain, and requires access to the models of the bots as a form of “white-box testing”. Interactive evaluation mitigates this problem but requires human involvement. In our work, we propose an interactive chatbot evaluation framework in which chatbots compete with each other like in a sports tournament, using flexible scoring metrics. This framework can efficiently rank chatbots independently from their model architectures and the domains for which they are trained.
Anthology ID:
2022.acl-long.522
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7579–7590
Language:
URL:
https://aclanthology.org/2022.acl-long.522
DOI:
10.18653/v1/2022.acl-long.522
Bibkey:
Cite (ACL):
Ruolan Yang, Zitong Li, Haifeng Tang, and Kenny Zhu. 2022. ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7579–7590, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments (Yang et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.522.pdf
Software:
 2022.acl-long.522.software.zip
Code
 ruolanyang/chatmatch