Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems

Shiki Sato, Yosuke Kishinami, Hiroaki Sugiyama, Reina Akama, Ryoko Tokuhisa, Jun Suzuki


Abstract
Automation of dialogue system evaluation is a driving force for the efficient development of dialogue systems. This paper introduces the bipartite-play method, a dialogue collection method for automating dialogue system evaluation. It addresses the limitations of existing dialogue collection methods: (i) inability to compare with systems that are not publicly available, and (ii) vulnerability to cheating by intentionally selecting systems to be compared. Experimental results show that the automatic evaluation using the bipartite-play method mitigates these two drawbacks and correlates as strongly with human subjectivity as existing methods.
Anthology ID:
2022.aacl-srw.2
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop
Month:
November
Year:
2022
Address:
Online
Editors:
Yan Hanqi, Yang Zonghan, Sebastian Ruder, Wan Xiaojun
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8–16
Language:
URL:
https://aclanthology.org/2022.aacl-srw.2
DOI:
10.18653/v1/2022.aacl-srw.2
Bibkey:
Cite (ACL):
Shiki Sato, Yosuke Kishinami, Hiroaki Sugiyama, Reina Akama, Ryoko Tokuhisa, and Jun Suzuki. 2022. Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 8–16, Online. Association for Computational Linguistics.
Cite (Informal):
Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems (Sato et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.aacl-srw.2.pdf