Learning from Mistakes: Towards Robust Neural Machine Translation for Disfluent L2 Sentences

Shuyue Stella Li; Philipp Koehn

Learning from Mistakes: Towards Robust Neural Machine Translation for Disfluent L2 Sentences

Abstract

We study the sentences written by second-language (L2) learners to improve the robustness of current neural machine translation (NMT) models on this type of data. Current large datasets used to train NMT systems are mostly Wikipedia or government documents written by highly competent speakers of that language, especially English. However, given that English is the most common second language, it is crucial that machine translation systems are robust against the large number of sentences written by L2 learners of English. By studying the difficulties faced by humans in their L2 acquisition process, we are able to transfer such insights to machine translation systems to recover from source-side fluency variations. In this work, we create additional training data with artificial errors similar to mistakes made by L2 learners of various fluency levels to improve the quality of the machine translation system. We test our method in zero-shot settings on the JFLEG-es (English-Spanish) dataset. The quality of our machine translation system on disfluent sentences outperforms the baseline by 1.8 BLEU scores.

Anthology ID:: 2023.mtsummit-research.20
Volume:: Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track
Month:: September
Year:: 2023
Address:: Macau SAR, China
Editors:: Masao Utiyama, Rui Wang
Venue:: MTSummit
SIG:
Publisher:: Asia-Pacific Association for Machine Translation
Note:
Pages:: 235–247
Language:
URL:: https://aclanthology.org/2023.mtsummit-research.20/
DOI:
Bibkey:
Cite (ACL):: Shuyue Stella Li and Philipp Koehn. 2023. Learning from Mistakes: Towards Robust Neural Machine Translation for Disfluent L2 Sentences. In Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track, pages 235–247, Macau SAR, China. Asia-Pacific Association for Machine Translation.
Cite (Informal):: Learning from Mistakes: Towards Robust Neural Machine Translation for Disfluent L2 Sentences (Li & Koehn, MTSummit 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.mtsummit-research.20.pdf

PDF Cite Search Fix data