CantonMT: Cantonese-English Neural Machine Translation Looking into Evaluations

Kung Yin Hong; Lifeng Han; Riza Theresa Batista-Navarro; Goran Nenadic

CantonMT: Cantonese-English Neural Machine Translation Looking into Evaluations

Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, Goran Nenadic

Abstract

Cantonese-English is a low-resource language pair for machine translation (MT) studies, despite the vast amount of English content publicly available online and the large amount of native Cantonese speakers. Based on our previous work on CANTONMT from Hong et al. (2024), where we created the open-source fine-tuned systems for Cantonese-English Neural MT (NMT) using base-models NLLB, OpusMT, and mBART and corpus collections and creation, in this paper, we report our extended experiments on model training and comparisons. In particular, we incorporated human-based evaluations using native Cantonese speakers who are also fluent in the English language. We designed a modified version of the HOPE metric from Gladkoff and Han (2022) for the categorised error analysis and serenity-level statistics (naming HOPES). The models selected for human evaluations are NLLB-mBART fine-tuned and two translators from commercial companies: Bing and GPT4.

Anthology ID:: 2024.amta-presentations.9
Volume:: Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations)
Month:: September
Year:: 2024
Address:: Chicago, USA
Editors:: Marianna Martindale, Janice Campbell, Konstantin Savenkov, Shivali Goel
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:: 133–144
Language:
URL:: https://aclanthology.org/2024.amta-presentations.9
DOI:
Bibkey:
Cite (ACL):: Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, and Goran Nenadic. 2024. CantonMT: Cantonese-English Neural Machine Translation Looking into Evaluations. In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations), pages 133–144, Chicago, USA. Association for Machine Translation in the Americas.
Cite (Informal):: CantonMT: Cantonese-English Neural Machine Translation Looking into Evaluations (Hong et al., AMTA 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.amta-presentations.9.pdf

PDF Cite Search