Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

Katherine Thai; Marzena Karpinska; Kalpesh Krishna; Bill Ray; Moira Inghilleri; John Wieting; Mohit Iyyer

doi:10.18653/v1/2022.emnlp-main.672

Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

Katherine Thai, Marzena Karpinska, Kalpesh Krishna, Bill Ray, Moira Inghilleri, John Wieting, Mohit Iyyer

Abstract

Literary translation is a culturally significant task, but it is bottlenecked by the small number of qualified literary translators relative to the many untranslated works published around the world. Machine translation (MT) holds potential to complement the work of human translators by improving both training procedures and their overall efficiency. Literary translation is less constrained than more traditional MT settings since translators must balance meaning equivalence, readability, and critical interpretability in the target language. This property, along with the complex discourse-level context present in literary texts, also makes literary MT more challenging to computationally model and evaluate. To explore this task, we collect a dataset (Par3) of non-English language novels in the public domain, each aligned at the paragraph level to both human and automatic English translations. Using Par3, we discover that expert literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%, while state-of-the-art automatic MT metrics do not correlate with those preferences. The experts note that MT outputs contain not only mistranslations, but also discourse-disrupting errors and stylistic inconsistencies. To address these problems, we train a post-editing model whose output is preferred over normal MT output at a rate of 69% by experts. We publicly release Par3 to spur future research into literary MT.

Anthology ID:: 2022.emnlp-main.672
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9882–9902
Language:
URL:: https://aclanthology.org/2022.emnlp-main.672/
DOI:: 10.18653/v1/2022.emnlp-main.672
Bibkey:
Cite (ACL):: Katherine Thai, Marzena Karpinska, Kalpesh Krishna, Bill Ray, Moira Inghilleri, John Wieting, and Mohit Iyyer. 2022. Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9882–9902, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature (Thai et al., EMNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.emnlp-main.672.pdf

PDF Cite Search Fix data