UvA-MT’s Participation in the WMT24 General Translation Shared Task

Shaomu Tan, David Stap, Seth Aycock, Christof Monz, Di Wu


Abstract
Fine-tuning Large Language Models (FT-LLMs) with parallel data has emerged as a promising paradigm in recent machine translation research. In this paper, we explore the effectiveness of FT-LLMs and compare them to traditional encoder-decoder Neural Machine Translation (NMT) systems under the WMT24 general MT shared task for English to Chinese direction. We implement several techniques, including Quality Estimation (QE) data filtering, supervised fine-tuning, and post-editing that integrate NMT systems with LLMs. We demonstrate that fine-tuning LLaMA2 on a high-quality but relatively small bitext dataset (100K) yields COMET results comparable to much smaller encoder-decoder NMT systems trained on over 22 million bitexts. However, this approach largely underperforms on surface-level metrics like BLEU and ChrF. We further control the data quality using the COMET-based quality estimation method. Our experiments show that 1) filtering low COMET scores largely improves encoder-decoder systems, but 2) no clear gains are observed for LLMs when further refining the fine-tuning set. Finally, we show that combining NMT systems with LLMs via post-editing generally yields the best performance for the WMT24 official test set.
Anthology ID:
2024.wmt-1.11
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
176–184
Language:
URL:
https://aclanthology.org/2024.wmt-1.11
DOI:
Bibkey:
Cite (ACL):
Shaomu Tan, David Stap, Seth Aycock, Christof Monz, and Di Wu. 2024. UvA-MT’s Participation in the WMT24 General Translation Shared Task. In Proceedings of the Ninth Conference on Machine Translation, pages 176–184, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
UvA-MT’s Participation in the WMT24 General Translation Shared Task (Tan et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.11.pdf