Samsung R&D Institute Poland submission to WMT20 News Translation Task

Mateusz Krubiński, Marcin Chochowski, Bartłomiej Boczek, Mikołaj Koszowski, Adam Dobrowolski, Marcin Szymański, Paweł Przybysz


Abstract
This paper describes the submission to the WMT20 shared news translation task by Samsung R&D Institute Poland. We submitted systems for six language directions: English to Czech, Czech to English, English to Polish, Polish to English, English to Inuktitut and Inuktitut to English. For each, we trained a single-direction model. However, directions including English, Polish and Czech were derived from a common multilingual base, which was later fine-tuned on each particular direction. For all the translation directions, we used a similar training regime, with iterative training corpora improvement through back-translation and model ensembling. For the En → Cs direction, we additionally leveraged document-level information by re-ranking the beam output with a separate model.
Anthology ID:
2020.wmt-1.16
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
181–190
Language:
URL:
https://aclanthology.org/2020.wmt-1.16
DOI:
Bibkey:
Cite (ACL):
Mateusz Krubiński, Marcin Chochowski, Bartłomiej Boczek, Mikołaj Koszowski, Adam Dobrowolski, Marcin Szymański, and Paweł Przybysz. 2020. Samsung R&D Institute Poland submission to WMT20 News Translation Task. In Proceedings of the Fifth Conference on Machine Translation, pages 181–190, Online. Association for Computational Linguistics.
Cite (Informal):
Samsung R&D Institute Poland submission to WMT20 News Translation Task (Krubiński et al., WMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wmt-1.16.pdf
Video:
 https://slideslive.com/38939566