Lost in Translation, and Found: Detecting and Interpreting Translation Effects

Shira Wein; Anna Serbina; Jiyuan Ji; Nathan Wolf; Jason DeGraaff; Prajakta Kini; Maria Leonor Pacheco

Lost in Translation, and Found: Detecting and Interpreting Translation Effects

Shira Wein, Anna Serbina, Jiyuan Ji, Nathan Wolf, Jason DeGraaff, Prajakta Kini, Maria Leonor Pacheco

Abstract

Translationese refers to the statistical patterns that distinguish translated texts from original texts, which are often subtle and imperceptible to human readers. When translated texts appear in either training or testing data, these patterns can negatively affect model performance or warp model evaluation. We approach the task of discerning whether a text was originally written in English or translated into English by fine-tuning contemporary foundation models at distinct item lengths and achieve state-of-the-art performance (94% Macro F1). Given that these linguistic cues are subtle and often imperceptible to humans, we analyze the features which enable our model’s high performance. Employing a suite of interpretability-based techniques, we find that: (1) our high accuracy is enabled by a collection of linguistic features, a number of which correspond with linguistic theories of translationese, and (2) pretrained neural models are adept at picking up these features without any fine-tuning.

Anthology ID:: 2026.acl-long.781
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17172–17187
Language:
URL:: https://aclanthology.org/2026.acl-long.781/
DOI:
Bibkey:
Cite (ACL):: Shira Wein, Anna Serbina, Jiyuan Ji, Nathan Wolf, Jason DeGraaff, Prajakta Kini, and Maria Leonor Pacheco. 2026. Lost in Translation, and Found: Detecting and Interpreting Translation Effects. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 17172–17187, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Lost in Translation, and Found: Detecting and Interpreting Translation Effects (Wein et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.781.pdf
Checklist:: 2026.acl-long.781.checklist.pdf

PDF Cite Search Checklist Fix data