BotEval: Facilitating Interactive Human Evaluation

Hyundong Cho, Thamme Gowda, Yuyang Huang, Zixun Lu, Tianli Tong, Jonathan May


Abstract
Following the rapid progress in natural language processing (NLP) models, language models are applied to increasingly more complex interactive tasks such as negotiations and conversation moderations. Having human evaluators directly interact with these NLP models is essential for adequately evaluating the performance on such interactive tasks. We develop BotEval, an easily customizable, open-source, evaluation toolkit that focuses on enabling human-bot interactions as part of the evaluation process, as opposed to human evaluators making judgements for a static input. BotEval balances flexibility for customization and user-friendliness by providing templates for common use cases that span various degrees of complexity and built-in compatibility with popular crowdsourcing platforms.We showcase the numerous useful features of BotEval through a study that evaluates the performance of various chatbots on their effectiveness for conversational moderation and discuss how BotEval differs from other annotation tools.
Anthology ID:
2024.luhme-demos.11
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Yixin Cao, Yang Feng, Deyi Xiong
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
107–116
Language:
URL:
https://aclanthology.org/2024.luhme-demos.11/
DOI:
10.18653/v1/2024.acl-demos.11
Bibkey:
Cite (ACL):
Hyundong Cho, Thamme Gowda, Yuyang Huang, Zixun Lu, Tianli Tong, and Jonathan May. 2024. BotEval: Facilitating Interactive Human Evaluation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 107–116, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
BotEval: Facilitating Interactive Human Evaluation (Cho et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-demos.11.pdf