Junhao Ruan


2024

pdf bib
Exploiting Target Language Data for Neural Machine Translation Beyond Back Translation
Abudurexiti Reheman | Yingfeng Luo | Junhao Ruan | Chunliang Zhang | Anxiang Ma | Tong Xiao | JingBo Zhu
Findings of the Association for Computational Linguistics: ACL 2024

Neural Machine Translation (NMT) encounters challenges when translating in new domains and low-resource languages. To address these issues, researchers have proposed methods to integrate additional knowledge into NMT, such as translation memories (TMs). However, finding TMs that closely match the input sentence remains challenging, particularly in specific domains. On the other hand, monolingual data is widely accessible in most languages, and back-translation is seen as a promising approach for utilizing target language data. Nevertheless, it still necessitates additional training. In this paper, we introduce Pseudo-kNN-MT, a variant of k-nearest neighbor machine translation (kNN-MT) that utilizes target language data by constructing a pseudo datastore. Furthermore, we investigate the utility of large language models (LLMs) for the kNN component. Experimental results demonstrate that our approach exhibits strong domain adaptation capability in both high-resource and low-resource machine translation. Notably, LLMs are found to be beneficial for robust NMT systems.