ChatEval: A Tool for Chatbot Evaluation

João Sedoc, Daphne Ippolito, Arun Kirubarajan, Jai Thirani, Lyle Ungar, Chris Callison-Burch


Abstract
Open-domain dialog systems (i.e. chatbots) are difficult to evaluate. The current best practice for analyzing and comparing these dialog systems is the use of human judgments. However, the lack of standardization in evaluation procedures, and the fact that model parameters and code are rarely published hinder systematic human evaluation experiments. We introduce a unified framework for human evaluation of chatbots that augments existing tools and provides a web-based hub for researchers to share and compare their dialog systems. Researchers can submit their trained models to the ChatEval web interface and obtain comparisons with baselines and prior work. The evaluation code is open-source to ensure standardization and transparency. In addition, we introduce open-source baseline models and evaluation datasets. ChatEval can be found at https://chateval.org.
Anthology ID:
N19-4011
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Waleed Ammar, Annie Louis, Nasrin Mostafazadeh
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
60–65
Language:
URL:
https://aclanthology.org/N19-4011
DOI:
10.18653/v1/N19-4011
Bibkey:
Cite (ACL):
João Sedoc, Daphne Ippolito, Arun Kirubarajan, Jai Thirani, Lyle Ungar, and Chris Callison-Burch. 2019. ChatEval: A Tool for Chatbot Evaluation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 60–65, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
ChatEval: A Tool for Chatbot Evaluation (Sedoc et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-4011.pdf