Paweł Przewłocki


2022

pdf bib
Samsung R&D Institute Poland Participation in WMT 2022
Adam Dobrowolski | Mateusz Klimaszewski | Adam Myśliwy | Marcin Szymański | Jakub Kowalski | Kornelia Szypuła | Paweł Przewłocki | Paweł Przybysz
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper presents the system description of Samsung R&D Institute Poland participation in WMT 2022 for General MT solution for medium and low resource languages: Russian and Croatian. Our approach combines iterative noised/tagged back-translation and iterative distillation. We investigated different monolingual resources and compared their influence on final translations. We used available BERT-likemodels for text classification and for extracting domains of texts. Then we prepared an ensemble of NMT models adapted to multiple domains. Finally we attempted to predict ensemble weight vectors from the BERT-based domain classifications for individual sentences. Our final trained models reached quality comparable to best online translators using only limited constrained resources during training.