Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation

Sheila Castilho


Abstract
Document-level human evaluation of machine translation (MT) has been raising interest in the community. However, little is known about the issues of using document-level methodologies to assess MT quality. In this article, we compare the inter-annotator agreement (IAA) scores, the effort to assess the quality in different document-level methodologies, and the issue of misevaluation when sentences are evaluated out of context.
Anthology ID:
2021.humeval-1.4
Volume:
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)
Month:
April
Year:
2021
Address:
Online
Editors:
Anya Belz, Shubham Agarwal, Yvette Graham, Ehud Reiter, Anastasia Shimorina
Venue:
HumEval
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34–45
Language:
URL:
https://aclanthology.org/2021.humeval-1.4
DOI:
Bibkey:
Cite (ACL):
Sheila Castilho. 2021. Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), pages 34–45, Online. Association for Computational Linguistics.
Cite (Informal):
Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation (Castilho, HumEval 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.humeval-1.4.pdf
Video:
 https://www.youtube.com/watch?v=djkFwF2RJ74
Video:
 https://aclanthology.org/2021.humeval-1.4.mp4