Improving Neural Machine Translation on resource-limited pairs using auxiliary data of a third language

Ander Martínez; Yuji Matsumoto

Improving Neural Machine Translation on resource-limited pairs using auxiliary data of a third language

Abstract

In the recent years interest in Deep Neural Networks (DNN) has grown in the field of Natural Language Processing, as new training methods have been proposed. The usage of DNN has achieved state-of-the-art performance in various areas. Neural Machine Translation (NMT) described by Bahdanau et al. (2014) and its successive variations have shown promising results. DNN, however, tend to over-fit on small data-sets, which makes this method impracticable for resource-limited language pairs. This article combines three different ideas (splitting words into smaller units, using an extra dataset of a related language pair and using monolingual data) for improving the performance of NMT models on language pairs with limited data. Our experiments show that, in some cases, our proposed approach to subword-units performs better than BPE (Byte pair encoding) and that auxiliary language-pairs and monolingual data can help improve the performance of languages with limited resources.

Anthology ID:: 2016.amta-researchers.11
Volume:: Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track
Month:: October 28 - November 1
Year:: 2016
Address:: Austin, TX, USA
Editors:: Spence Green, Lane Schwartz
Venue:: AMTA
SIG:
Publisher:: The Association for Machine Translation in the Americas
Note:
Pages:: 135–148
Language:
URL:: https://aclanthology.org/2016.amta-researchers.11/
DOI:
Bibkey:
Cite (ACL):: Ander Martinez and Yuji Matsumoto. 2016. Improving Neural Machine Translation on resource-limited pairs using auxiliary data of a third language. In Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track, pages 135–148, Austin, TX, USA. The Association for Machine Translation in the Americas.
Cite (Informal):: Improving Neural Machine Translation on resource-limited pairs using auxiliary data of a third language (Martinez & Matsumoto, AMTA 2016)
Copy Citation:
PDF:: https://aclanthology.org/2016.amta-researchers.11.pdf

PDF Cite Search Fix data