Answerability: A custom metric for evaluating chatbot performance

Pranav Gupta; Anand A. Rajasekar; Amisha Patel; Mandar Kulkarni; Alexander Sunell; Kyung Kim; Krishnan Ganapathy; Anusua Trivedi

doi:10.18653/v1/2022.gem-1.27

Answerability: A custom metric for evaluating chatbot performance

Pranav Gupta, Anand A. Rajasekar, Amisha Patel, Mandar Kulkarni, Alexander Sunell, Kyung Kim, Krishnan Ganapathy, Anusua Trivedi

Abstract

Most commercial conversational AI products in domains spanning e-commerce, health care, finance, and education involve a hierarchy of NLP models that perform a variety of tasks such as classification, entity recognition, question-answering, sentiment detection, semantic text similarity, and so on. Despite our understanding of each of the constituent models, we do not have a clear view as to how these models affect the overall platform metrics. To bridge this gap, we define a metric known as answerability, which penalizes not only irrelevant or incorrect chatbot responses but also unhelpful responses that do not serve the chatbot’s purpose despite being correct or relevant. Additionally, we describe a formula-based mathematical framework to relate individual model metrics to the answerability metric. We also describe a modeling approach for predicting a chatbot’s answerability to a user question and its corresponding chatbot response.

Anthology ID:: 2022.gem-1.27
Volume:: Proceedings of the Second Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Antoine Bosselut, Khyathi Chandu, Kaustubh Dhole, Varun Gangal, Sebastian Gehrmann, Yacine Jernite, Jekaterina Novikova, Laura Perez-Beltrachini
Venue:: GEM
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 316–325
Language:
URL:: https://aclanthology.org/2022.gem-1.27/
DOI:: 10.18653/v1/2022.gem-1.27
Bibkey:
Cite (ACL):: Pranav Gupta, Anand A. Rajasekar, Amisha Patel, Mandar Kulkarni, Alexander Sunell, Kyung Kim, Krishnan Ganapathy, and Anusua Trivedi. 2022. Answerability: A custom metric for evaluating chatbot performance. In Proceedings of the Second Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 316–325, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Answerability: A custom metric for evaluating chatbot performance (Gupta et al., GEM 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.gem-1.27.pdf
Video:: https://aclanthology.org/2022.gem-1.27.mp4

PDF Cite Search Video Fix data