CUNI Systems in WMT21: Revisiting Backtranslation Techniques for English-Czech NMT

Petr Gebauer, Ondřej Bojar, Vojtěch Švandelík, Martin Popel


Abstract
We describe our two NMT systems submitted to the WMT2021 shared task in English-Czech news translation: CUNI-DocTransformer (document-level CUBBITT) and CUNI-Marian-Baselines. We improve the former with a better sentence-segmentation pre-processing and a post-processing for fixing errors in numbers and units. We use the latter for experiments with various backtranslation techniques.
Anthology ID:
2021.wmt-1.7
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Editors:
Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Tom Kocmi, Andre Martins, Makoto Morishita, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
123–129
Language:
URL:
https://aclanthology.org/2021.wmt-1.7
DOI:
Bibkey:
Cite (ACL):
Petr Gebauer, Ondřej Bojar, Vojtěch Švandelík, and Martin Popel. 2021. CUNI Systems in WMT21: Revisiting Backtranslation Techniques for English-Czech NMT. In Proceedings of the Sixth Conference on Machine Translation, pages 123–129, Online. Association for Computational Linguistics.
Cite (Informal):
CUNI Systems in WMT21: Revisiting Backtranslation Techniques for English-Czech NMT (Gebauer et al., WMT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wmt-1.7.pdf