A Reproduction Study of the Human Evaluation of Role-Oriented Dialogue Summarization Models

Mingqi Gao, Jie Ruan, Xiaojun Wan


Abstract
This paper reports a reproduction study of the human evaluation of role-oriented dialogue summarization models, as part of the ReproNLP Shared Task 2023 on Reproducibility of Evaluations in NLP. We outline the disparities between the original study’s experimental design and our reproduction study, along with the outcomes obtained. The inter-annotator agreement within the reproduction study is observed to be lower, measuring 0.40 as compared to the original study’s 0.48. Among the six conclusions drawn in the original study, four are validated in our reproduction study. We confirm the effectiveness of the proposed approach on the overall metric, albeit with slightly poorer relative performance compared to the original study. Furthermore, we raise an open-ended inquiry: how can subjective practices in the original study be identified and addressed when conducting reproduction studies?
Anthology ID:
2023.humeval-1.10
Volume:
Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Anya Belz, Maja Popović, Ehud Reiter, Craig Thomson, João Sedoc
Venues:
HumEval | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
124–129
Language:
URL:
https://aclanthology.org/2023.humeval-1.10
DOI:
Bibkey:
Cite (ACL):
Mingqi Gao, Jie Ruan, and Xiaojun Wan. 2023. A Reproduction Study of the Human Evaluation of Role-Oriented Dialogue Summarization Models. In Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems, pages 124–129, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
A Reproduction Study of the Human Evaluation of Role-Oriented Dialogue Summarization Models (Gao et al., HumEval-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.humeval-1.10.pdf