Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting

Emmy Liu; Aditi Chaudhary; Graham Neubig

doi:10.18653/v1/2023.emnlp-main.933

Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting

Emmy Liu, Aditi Chaudhary, Graham Neubig

Abstract

Idioms are common in everyday language, but often pose a challenge to translators because their meanings do not follow from the meanings of their parts. Despite significant advances, machine translation systems still struggle to translate idiomatic expressions. We provide a simple characterization of idiomatic translation and related issues. This allows us to conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations. To expand multilingual resources, we compile a dataset of ~4k natural sentences containing idiomatic expressions in French, Finnish, and Japanese. To improve translation of natural idioms, we introduce two straightforward yet effective techniques: the strategic upweighting of training loss on potentially idiomatic sentences, and using retrieval-augmented models. This not only improves the accuracy of a strong pretrained MT model on idiomatic sentences by up to 13% in absolute accuracy, but also holds potential benefits for non-idiomatic sentences.

Anthology ID:: 2023.emnlp-main.933
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15095–15111
Language:
URL:: https://aclanthology.org/2023.emnlp-main.933/
DOI:: 10.18653/v1/2023.emnlp-main.933
Bibkey:
Cite (ACL):: Emmy Liu, Aditi Chaudhary, and Graham Neubig. 2023. Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15095–15111, Singapore. Association for Computational Linguistics.
Cite (Informal):: Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting (Liu et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.933.pdf
Video:: https://aclanthology.org/2023.emnlp-main.933.mp4

PDF Cite Search Video Fix data