IOL Research Machine Translation Systems for WMT24 General Machine Translation Shared Task

Wenbo Zhang


Abstract
This paper illustrates the submission system of the IOL Research team for the WMT24 General Machine Translation shared task. We submitted translations for all translation directions in the general machine translation task. According to the official track categorization, our system qualifies as an open system due to the utilization of open-source resources in developing our machine translation model. With the growing prevalence of large language models (LLMs) as a conventional approach for managing diverse NLP tasks, we have developed our machine translation system by leveraging the capabilities of LLMs. Overall, we first performed continued pretraining using the open-source LLMs with tens of billions of parameters to enhance the model’s multilingual capabilities. Subsequently, we employed open-source Large Language Models, equipped with hundreds of billions of parameters, to generate synthetic data. This data was then blended with a modest quantity of additional open-source data for precise supervised fine-tuning. In the final stage, we also used ensemble learning to improve translation quality. Based on the official automated evaluation metrics, our system excelled by securing the top position in 8 out of the total 11 translation directions, spanning both open and constrained system categories.
Anthology ID:
2024.wmt-1.8
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
147–154
Language:
URL:
https://aclanthology.org/2024.wmt-1.8
DOI:
Bibkey:
Cite (ACL):
Wenbo Zhang. 2024. IOL Research Machine Translation Systems for WMT24 General Machine Translation Shared Task. In Proceedings of the Ninth Conference on Machine Translation, pages 147–154, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
IOL Research Machine Translation Systems for WMT24 General Machine Translation Shared Task (Zhang, WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.8.pdf