Kai-Ying Lin


2025

pdf bib
From Scarcity to Scalability: Lexicon and Grammar Enhanced Amis to Mandarin Translation with GPT Models
Joseph Lin | Kai-Ying Lin | Hung-Yu Kao
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

Machine translation (MT) for low-resource languages remains constrained by extreme data scarcity, making traditional fine-tuning infeasible. This study examines Amis→Mandarin translation as a practical case, leveraging GPT-4o-mini and GPT-5-mini with dictionary integration and grammar-informed prompting. Experiments show that GPT-5-mini, supported by dictionary, achieves usable quality (BLEU-3 ∼31, COMET ∼78, BLEURT ∼71). To address the bottleneck of incomplete dictionaries, we propose Context-Driven Lexical Augmentation, which infers Mandarin equivalents for unseen Amis terms from corpus context, raising BLEU-3 to 34 and establishing a stronger basis for semi-automatic corpus generation. These results demonstrate that expanding and refining dictionary provides greater benefits than parameter-intensive fine-tuning in extremely low-resource settings. We also discuss the performance gap between Amis→Mandarin and Mandarin→Amis translation, attributing it to Amis’s morphological complexity and narrower semantic coverage. Overall, our resource-driven strategy offers a scalable pathway toward high-quality MT and corpus expansion, ultimately supporting both linguistic research and language revitalization.