Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation

Sheila Castilho

Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation

Abstract

Document-level human evaluation of machine translation (MT) has been raising interest in the community. However, little is known about the issues of using document-level methodologies to assess MT quality. In this article, we compare the inter-annotator agreement (IAA) scores, the effort to assess the quality in different document-level methodologies, and the issue of misevaluation when sentences are evaluated out of context.

Anthology ID:: 2021.humeval-1.4
Volume:: Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)
Month:: April
Year:: 2021
Address:: Online
Editors:: Anya Belz, Shubham Agarwal, Yvette Graham, Ehud Reiter, Anastasia Shimorina
Venue:: HumEval
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34–45
Language:
URL:: https://aclanthology.org/2021.humeval-1.4
DOI:
Bibkey:
Cite (ACL):: Sheila Castilho. 2021. Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), pages 34–45, Online. Association for Computational Linguistics.
Cite (Informal):: Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation (Castilho, HumEval 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.humeval-1.4.pdf
Video:: https://www.youtube.com/watch?v=djkFwF2RJ74
Video:: https://aclanthology.org/2021.humeval-1.4.mp4

PDF Cite Search Video Video