JHU WMT 2025 CreoleMT System Description: Data for Belizean Kriol and French Guianese Creole MT

Nathaniel Robinson


Abstract
This document details the Johns Hopkins University’s submission to the 2025 WMT Shared Task for Creole Language Translation. We submitted exclusively to the data subtask, contributing machine translation bitext corpora for Belizean Kriol with English translations, and French Guianese Creole with French translations. These datasets contain 5,530 and 1,671 parallel lines of text, respectively, thus amounting to an 2,300% increase in publicly available lines of bitext for Belizean Creole with English, and an 370% such increase for French Guianese Creole with French. Experiments demonstrate genre-dependent improvements on our proposed test sets when the relevant state-of-the-art model is fine-tuned on our proposed train sets, with improvements across genres of up to 33.3 chrF++.
Anthology ID:
2025.wmt-1.93
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1191–1197
Language:
URL:
https://aclanthology.org/2025.wmt-1.93/
DOI:
Bibkey:
Cite (ACL):
Nathaniel Robinson. 2025. JHU WMT 2025 CreoleMT System Description: Data for Belizean Kriol and French Guianese Creole MT. In Proceedings of the Tenth Conference on Machine Translation, pages 1191–1197, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
JHU WMT 2025 CreoleMT System Description: Data for Belizean Kriol and French Guianese Creole MT (Robinson, WMT 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.wmt-1.93.pdf