LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing

Yu Li, Josh Arnold, Feifan Yan, Weiyan Shi, Zhou Yu


Abstract
We present LEGOEval, an open-source toolkit that enables researchers to easily evaluate dialogue systems in a few lines of code using the online crowdsource platform, Amazon Mechanical Turk. Compared to existing toolkits, LEGOEval features a flexible task design by providing a Python API that maps to commonly used React.js interface components. Researchers can personalize their evaluation procedures easily with our built-in pages as if playing with LEGO blocks. Thus, LEGOEval provides a fast, consistent method for reproducing human evaluation results. Besides the flexible task design, LEGOEval also offers an easy API to review collected data.
Anthology ID:
2021.acl-demo.38
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations
Month:
August
Year:
2021
Address:
Online
Editors:
Heng Ji, Jong C. Park, Rui Xia
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
317–324
Language:
URL:
https://aclanthology.org/2021.acl-demo.38
DOI:
10.18653/v1/2021.acl-demo.38
Bibkey:
Cite (ACL):
Yu Li, Josh Arnold, Feifan Yan, Weiyan Shi, and Zhou Yu. 2021. LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pages 317–324, Online. Association for Computational Linguistics.
Cite (Informal):
LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing (Li et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-demo.38.pdf
Video:
 https://aclanthology.org/2021.acl-demo.38.mp4
Code
 yooli23/LEGOEval