An Analysis of Surprisal Uniformity in Machine and Human Translations

Josef Jon, Ondřej Bojar


Abstract
This study examines neural machine translation (NMT) and its performance on texts that diverege from typical standards, focusing on how information is organized within sentences. We analyze surprisal distributions in source texts, human translations, and machine translations across several datasets to determine if NMT systems naturally promote a uniform density of surprisal in their translations, even when the original texts do not adhere to this principle.The findings reveal that NMT tends to align more closely with source texts in terms of surprisal uniformity compared to human translations.We analyzed absolute values of the surprisal uniformity measures as well, expecting that human translations will be less uniform. In contradiction to our initial hypothesis, we did not find comprehensive evidence for this claim, with some results suggesting this might be the case for very diverse texts, like poetry.
Anthology ID:
2024.ctt-1.5
Volume:
Proceedings of the 1st Workshop on Creative-text Translation and Technology
Month:
June
Year:
2024
Address:
Sheffield, United Kingdom
Editors:
Bram Vanroy, Marie-Aude Lefer, Lieve Macken, Paola Ruffo
Venues:
CTT | WS
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
40–56
Language:
URL:
https://aclanthology.org/2024.ctt-1.5
DOI:
Bibkey:
Cite (ACL):
Josef Jon and Ondřej Bojar. 2024. An Analysis of Surprisal Uniformity in Machine and Human Translations. In Proceedings of the 1st Workshop on Creative-text Translation and Technology, pages 40–56, Sheffield, United Kingdom. European Association for Machine Translation.
Cite (Informal):
An Analysis of Surprisal Uniformity in Machine and Human Translations (Jon & Bojar, CTT-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.ctt-1.5.pdf