Naive Regularizers for Low-Resource Neural Machine Translation

Meriem Beloucif, Ana Valeria Gonzalez, Marcel Bollmann, Anders Søgaard


Abstract
Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. Neural models have to be trained on large amounts of data and have been shown to perform poorly when only limited data is available. We show that using naive regularization methods, based on sentence length, punctuation and word frequencies, to penalize translations that are very different from the input sentences, consistently improves the translation quality across multiple low-resource languages. We experiment with 12 language pairs, varying the training data size between 17k to 230k sentence pairs. Our best regularizer achieves an average increase of 1.5 BLEU score and 1.0 TER score across all the language pairs. For example, we achieve a BLEU score of 26.70 on the IWSLT15 English–Vietnamese translation task simply by using relative differences in punctuation as a regularizer.
Anthology ID:
R19-1013
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
102–111
Language:
URL:
https://aclanthology.org/R19-1013
DOI:
10.26615/978-954-452-056-4_013
Bibkey:
Cite (ACL):
Meriem Beloucif, Ana Valeria Gonzalez, Marcel Bollmann, and Anders Søgaard. 2019. Naive Regularizers for Low-Resource Neural Machine Translation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 102–111, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Naive Regularizers for Low-Resource Neural Machine Translation (Beloucif et al., RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1013.pdf