Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation

Abhisek Chakrabarty; Raj Dabre; Chenchen Ding; Masao Utiyama; Eiichiro Sumita

doi:10.18653/v1/2020.coling-main.376

Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation

Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, Eiichiro Sumita

Abstract

In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data. Integrating manually designed or automatically extracted features into the NMT framework is known to be beneficial. However, this study emphasizes that the relevance of the features is crucial to the performance. Specifically, we propose two methods, 1) self relevance and 2) word-based relevance, to improve the representation of features for NMT. Experiments are conducted on translation tasks from English to eight Asian languages, with no more than twenty thousand sentences for training. The proposed methods improve translation quality for all tasks by up to 3.09 BLEU points. Discussions with visualization provide the explainability of the proposed methods where we show that the relevance methods provide weights to features thereby enhancing their impact on low-resource machine translation.

Anthology ID:: 2020.coling-main.376
Volume:: Proceedings of the 28th International Conference on Computational Linguistics
Month:: December
Year:: 2020
Address:: Barcelona, Spain (Online)
Editors:: Donia Scott, Nuria Bel, Chengqing Zong
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 4263–4274
Language:
URL:: https://aclanthology.org/2020.coling-main.376/
DOI:: 10.18653/v1/2020.coling-main.376
Bibkey:
Cite (ACL):: Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, and Eiichiro Sumita. 2020. Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4263–4274, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):: Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation (Chakrabarty et al., COLING 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.coling-main.376.pdf

PDF Cite Search Fix data