IOL Research Machine Translation Systems for WMT23 General Machine Translation Shared Task

Wenbo Zhang


Abstract
This paper describes the IOL Research team’s submission systems for the WMT23 general machine translation shared task. We participated in two language translation directions, including English-to-Chinese and Chinese-to-English. Our final primary submissions belong to constrained systems, which means for both translation directions we only use officially provided monolingual and bilingual data to train the translation systems. Our systems are based on Transformer architecture with pre-norm or deep-norm, which has been proven to be helpful for training deeper models. We employ methods such as back-translation, data diversification, domain fine-tuning and model ensemble to build our translation systems. An important aspect worth mentioning is our careful data cleaning process and the utilization of a substantial amount of monolingual data for data augmentation. Compared with the baseline system, our submissions have a large improvement in BLEU score.
Anthology ID:
2023.wmt-1.19
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
187–191
Language:
URL:
https://aclanthology.org/2023.wmt-1.19
DOI:
10.18653/v1/2023.wmt-1.19
Bibkey:
Cite (ACL):
Wenbo Zhang. 2023. IOL Research Machine Translation Systems for WMT23 General Machine Translation Shared Task. In Proceedings of the Eighth Conference on Machine Translation, pages 187–191, Singapore. Association for Computational Linguistics.
Cite (Informal):
IOL Research Machine Translation Systems for WMT23 General Machine Translation Shared Task (Zhang, WMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.wmt-1.19.pdf