Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese

Yikang Liu; Wanyang Zhang; Yiming Wang; Jialong Tang; Pei Zhang; Baosong Yang; Fei Huang; Rui Wang; Hai Hu

Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese

Yikang Liu, Wanyang Zhang, Yiming Wang, Jialong Tang, Pei Zhang, Baosong Yang, Fei Huang, Rui Wang, Hai Hu

Abstract

Translationese refers to linguistic properties that usually occur in translated texts. Previous works study translationese by framing it as a binary classification between original texts and translated texts. In this paper, we argue that translationese should be graded instead of binary and propose the first measure for translationese—the translationese-index (T-index), computed from the likelihood ratios of two contrastively fine-tuned language models (LMs). We use synthesized translations and translations in the wild to evaluate T-index’s generalizability in cross-domain settings and its validity against human judgments.Our results show that T-index can generalize to unseen genres, authors, and language pairs. Moreover, T-index computed using two 0.5B LMs fine-tuned on only 1-5k pairs of synthetic data can effectively capture translationese, as demonstrated by alignment with human pointwise ratings and pairwise judgments.Additionally, the correlation between T-index and existing machine translation (MT) quality estimation (QE) metrics such as BLEU and COMET is low, suggesting that T-index is not covered by these metrics andcan serve as a complementary metric in MT QE.

Anthology ID:: 2025.emnlp-main.633
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12532–12549
Language:
URL:: https://aclanthology.org/2025.emnlp-main.633/
DOI:
Bibkey:
Cite (ACL):: Yikang Liu, Wanyang Zhang, Yiming Wang, Jialong Tang, Pei Zhang, Baosong Yang, Fei Huang, Rui Wang, and Hai Hu. 2025. Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12532–12549, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese (Liu et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.633.pdf
Checklist:: 2025.emnlp-main.633.checklist.pdf

PDF Cite Search Checklist Fix data