From Scarcity to Scalability: Lexicon and Grammar Enhanced Amis to Mandarin Translation with GPT Models

Joseph Lin, Kai-Ying Lin, Hung-Yu Kao


Abstract
Machine translation (MT) for low-resource languages remains constrained by extreme data scarcity, making traditional fine-tuning infeasible. This study examines Amis→Mandarin translation as a practical case, leveraging GPT-4o-mini and GPT-5-mini with dictionary integration and grammar-informed prompting. Experiments show that GPT-5-mini, supported by dictionary, achieves usable quality (BLEU-3 ∼31, COMET ∼78, BLEURT ∼71). To address the bottleneck of incomplete dictionaries, we propose Context-Driven Lexical Augmentation, which infers Mandarin equivalents for unseen Amis terms from corpus context, raising BLEU-3 to 34 and establishing a stronger basis for semi-automatic corpus generation. These results demonstrate that expanding and refining dictionary provides greater benefits than parameter-intensive fine-tuning in extremely low-resource settings. We also discuss the performance gap between Amis→Mandarin and Mandarin→Amis translation, attributing it to Amis’s morphological complexity and narrower semantic coverage. Overall, our resource-driven strategy offers a scalable pathway toward high-quality MT and corpus expansion, ultimately supporting both linguistic research and language revitalization.
Anthology ID:
2025.rocling-main.20
Volume:
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:
November
Year:
2025
Address:
National Taiwan University, Taipei City, Taiwan
Editors:
Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:
ROCLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
167–175
Language:
URL:
https://aclanthology.org/2025.rocling-main.20/
DOI:
Bibkey:
Cite (ACL):
Joseph Lin, Kai-Ying Lin, and Hung-Yu Kao. 2025. From Scarcity to Scalability: Lexicon and Grammar Enhanced Amis to Mandarin Translation with GPT Models. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 167–175, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):
From Scarcity to Scalability: Lexicon and Grammar Enhanced Amis to Mandarin Translation with GPT Models (Lin et al., ROCLING 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.rocling-main.20.pdf