Feature Decay Algorithms for Neural Machine Translation

Alberto Poncelas, Gideon Maillette de Buy Wenniger, Andy Way


Abstract
Neural Machine Translation (NMT) systems require a lot of data to be competitive. For this reason, data selection techniques are used only for finetuning systems that have been trained with larger amounts of data. In this work we aim to use Feature Decay Algorithms (FDA) data selection techniques not only to fine-tune a system but also to build a complete system with less data. Our findings reveal that it is possible to find a subset of sentence pairs, that outperforms by 1.11 BLEU points the full training corpus, when used for training a German-English NMT system .
Anthology ID:
2018.eamt-main.24
Volume:
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Month:
May
Year:
2018
Address:
Alicante, Spain
Editors:
Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Miquel Esplà-Gomis, Maja Popović, Celia Rico, André Martins, Joachim Van den Bogaert, Mikel L. Forcada
Venue:
EAMT
SIG:
Publisher:
Note:
Pages:
259–268
Language:
URL:
https://aclanthology.org/2018.eamt-main.24
DOI:
Bibkey:
Cite (ACL):
Alberto Poncelas, Gideon Maillette de Buy Wenniger, and Andy Way. 2018. Feature Decay Algorithms for Neural Machine Translation. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, pages 259–268, Alicante, Spain.
Cite (Informal):
Feature Decay Algorithms for Neural Machine Translation (Poncelas et al., EAMT 2018)
Copy Citation:
PDF:
https://aclanthology.org/2018.eamt-main.24.pdf