A Semantic Uncertainty Sampling Strategy for Back-Translation in Low-Resources Neural Machine Translation

Yepai Jia; Yatu Ji; Xiang Xue; Lei Shi; Qing-Dao-Er-Ji Ren; Nier Wu; Na Liu; Chen Zhao; Fu Liu

doi:10.18653/v1/2025.acl-srw.35

A Semantic Uncertainty Sampling Strategy for Back-Translation in Low-Resources Neural Machine Translation

Yepai Jia, Yatu Ji, Xiang Xue, Lei Shi, Qing-Dao-Er-Ji Ren, Nier Wu, Na Liu, Chen Zhao, Fu Liu

Abstract

Back-translation has been proven effective in enhancing the performance of Neural Machine Translation (NMT), with its core mechanism relying on synthesizing parallel corpora to strengthen model training. However, while traditional back-translation methods alleviate the data scarcity in low-resource machine translation, their dependence on random sampling strategies ignores the semantic quality of monolingual data. This results in the contamination of model training through the inclusion of substantial low-quality samples in the generated corpora. To mitigate noise interference, additional training iterations or model scaling are required, significantly increasing computational costs. To address this challenge, this study proposes a Semantic Uncertainty Sampling strategy, which prioritizes sentences with higher semantic uncertainty as training samples by computationally evaluating the complexity of unannotated monolingual data. Experiments were conducted on three typical low-resource agglutinative language pairs: Mongolian-Chinese, Uyghur-Chinese, and Korean-Chinese. Results demonstrate an average BLEU score improvement of +1.7 on test sets across all three translation tasks, confirming the method’s effectiveness in enhancing translation accuracy and fluency. This approach provides a novel pathway for the efficient utilization of unannotated data in low-resource language scenarios.

Anthology ID:: 2025.acl-srw.35
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Jin Zhao, Mingyang Wang, Zhu Liu
Venues:: ACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 528–538
Language:
URL:: https://aclanthology.org/2025.acl-srw.35/
DOI:: 10.18653/v1/2025.acl-srw.35
Bibkey:
Cite (ACL):: Yepai Jia, Yatu Ji, Xiang Xue, Lei Shi, Qing-Dao-Er-Ji Ren, Nier Wu, Na Liu, Chen Zhao, and Fu Liu. 2025. A Semantic Uncertainty Sampling Strategy for Back-Translation in Low-Resources Neural Machine Translation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 528–538, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: A Semantic Uncertainty Sampling Strategy for Back-Translation in Low-Resources Neural Machine Translation (Jia et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-srw.35.pdf

PDF Cite Search Fix data