A Simple and Fast Strategy for Handling Rare Words in Neural Machine Translation

Nguyen-Hoang Minh-Cong, Vinh Thi Ngo, Van Vinh Nguyen


Abstract
Neural Machine Translation (NMT) has currently obtained state-of-the-art in machine translation systems. However, dealing with rare words is still a big challenge in translation systems. The rare words are often translated using a manual dictionary or copied from the source to the target with original words. In this paper, we propose a simple and fast strategy for integrating constraints during the training and decoding process to improve the translation of rare words. The effectiveness of our proposal is demonstrated in both high and low-resource translation tasks, including the language pairs: English → Vietnamese, Chinese → Vietnamese, Khmer → Vietnamese, and Lao → Vietnamese. We show the improvements of up to +1.8 BLEU scores over the baseline systems.
Anthology ID:
2022.aacl-srw.6
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop
Month:
November
Year:
2022
Address:
Online
Editors:
Yan Hanqi, Yang Zonghan, Sebastian Ruder, Wan Xiaojun
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
40–46
Language:
URL:
https://aclanthology.org/2022.aacl-srw.6
DOI:
Bibkey:
Cite (ACL):
Nguyen-Hoang Minh-Cong, Vinh Thi Ngo, and Van Vinh Nguyen. 2022. A Simple and Fast Strategy for Handling Rare Words in Neural Machine Translation. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 40–46, Online. Association for Computational Linguistics.
Cite (Informal):
A Simple and Fast Strategy for Handling Rare Words in Neural Machine Translation (Minh-Cong et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.aacl-srw.6.pdf