Human Raters Cannot Distinguish English Translations from Original English Texts

Shira Wein


Abstract
The term translationese describes the set of linguistic features unique to translated texts, which appear regardless of translation quality. Though automatic classifiers designed to distinguish translated texts achieve high accuracy and prior work has identified common hallmarks of translationese, human accuracy of identifying translated text is understudied. In this work, we perform a human evaluation of English original/translated texts in order to explore raters’ ability to classify texts as being original or translated English and the features that lead a rater to judge text as being translated. Ultimately, we find that, regardless of the annotators’ native language or the source language of the text, annotators are unable to distinguish translations from original English texts and also have low agreement. Our results provide critical insight into work in translation studies and context for assessments of translationese classifiers.
Anthology ID:
2023.emnlp-main.754
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12266–12272
Language:
URL:
https://aclanthology.org/2023.emnlp-main.754
DOI:
10.18653/v1/2023.emnlp-main.754
Bibkey:
Cite (ACL):
Shira Wein. 2023. Human Raters Cannot Distinguish English Translations from Original English Texts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12266–12272, Singapore. Association for Computational Linguistics.
Cite (Informal):
Human Raters Cannot Distinguish English Translations from Original English Texts (Wein, EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.754.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.754.mp4