Kung Yin Hong
2024
CantonMT: Cantonese-English Neural Machine Translation Looking into Evaluations
Kung Yin Hong
|
Lifeng Han
|
Riza Batista-Navarro
|
Goran Nenadic
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations)
Cantonese-English is a low-resource language pair for machine translation (MT) studies, despite the vast amount of English content publicly available online and the large amount of native Cantonese speakers. Based on our previous work on CANTONMT from Hong et al. (2024), where we created the open-source fine-tuned systems for Cantonese-English Neural MT (NMT) using base-models NLLB, OpusMT, and mBART and corpus collections and creation, in this paper, we report our extended experiments on model training and comparisons. In particular, we incorporated human-based evaluations using native Cantonese speakers who are also fluent in the English language. We designed a modified version of the HOPE metric from Gladkoff and Han (2022) for the categorised error analysis and serenity-level statistics (naming HOPES). The models selected for human evaluations are NLLB-mBART fine-tuned and two translators from commercial companies: Bing and GPT4.