%0 Conference Proceedings %T Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation %A Castilho, Sheila %Y Belz, Anya %Y Agarwal, Shubham %Y Graham, Yvette %Y Reiter, Ehud %Y Shimorina, Anastasia %S Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval) %D 2021 %8 April %I Association for Computational Linguistics %C Online %F castilho-2021-towards %X Document-level human evaluation of machine translation (MT) has been raising interest in the community. However, little is known about the issues of using document-level methodologies to assess MT quality. In this article, we compare the inter-annotator agreement (IAA) scores, the effort to assess the quality in different document-level methodologies, and the issue of misevaluation when sentences are evaluated out of context. %U https://aclanthology.org/2021.humeval-1.4 %P 34-45