Neural Machine Translation between similar South-Slavic languages

Maja Popović, Alberto Poncelas


Abstract
This paper describes the ADAPT-DCU machine translation systems built for the WMT 2020 shared task on Similar Language Translation. We explored several set-ups for NMT for Croatian–Slovenian and Serbian–Slovenian language pairs in both translation directions. Our experiments focus on different amounts and types of training data: we first apply basic filtering on the OpenSubtitles training corpora, then we perform additional cleaning of remaining misaligned segments based on character n-gram matching. Finally, we make use of additional monolingual data by creating synthetic parallel data through back-translation. Automatic evaluation shows that multilingual systems with joint Serbian and Croatian data are better than bilingual, as well as that character-based cleaning leads to improved scores while using less data. The results also confirm once more that adding back-translated data further improves the performance, especially when the synthetic data is similar to the desired domain of the development and test set. This, however, might come at a price of prolonged training time, especially for multitarget systems.
Anthology ID:
2020.wmt-1.51
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Editors:
Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
430–436
Language:
URL:
https://aclanthology.org/2020.wmt-1.51
DOI:
Bibkey:
Cite (ACL):
Maja Popović and Alberto Poncelas. 2020. Neural Machine Translation between similar South-Slavic languages. In Proceedings of the Fifth Conference on Machine Translation, pages 430–436, Online. Association for Computational Linguistics.
Cite (Informal):
Neural Machine Translation between similar South-Slavic languages (Popović & Poncelas, WMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wmt-1.51.pdf