TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task

Han Yang, Bojie Hu, Wanying Xie, Ambyera Han, Pan Liu, Jinan Xu, Qi Ju


Abstract
This paper describes TenTrans’ submission to WMT21 Multilingual Low-Resource Translation shared task for the Romance language pairs. This task focuses on improving translation quality from Catalan to Occitan, Romanian and Italian, with the assistance of related high-resource languages. We mainly utilize back-translation, pivot-based methods, multilingual models, pre-trained model fine-tuning, and in-domain knowledge transfer to improve the translation quality. On the test set, our best-submitted system achieves an average of 43.45 case-sensitive BLEU scores across all low-resource pairs. Our data, code, and pre-trained models used in this work are available in TenTrans evaluation examples.
Anthology ID:
2021.wmt-1.45
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Editors:
Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Tom Kocmi, Andre Martins, Makoto Morishita, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
376–382
Language:
URL:
https://aclanthology.org/2021.wmt-1.45
DOI:
Bibkey:
Cite (ACL):
Han Yang, Bojie Hu, Wanying Xie, Ambyera Han, Pan Liu, Jinan Xu, and Qi Ju. 2021. TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task. In Proceedings of the Sixth Conference on Machine Translation, pages 376–382, Online. Association for Computational Linguistics.
Cite (Informal):
TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task (Yang et al., WMT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wmt-1.45.pdf