Tao–Filipino Neural Machine Translation: Strategies for Ultra–Low-Resource Settings

Adrian Denzel Macayan; Luis Andrew Sunga Madridijo; Ellexandrei Esponilla; Zachary Mitchell Francisco

Tao–Filipino Neural Machine Translation: Strategies for Ultra–Low-Resource Settings

Adrian Denzel Macayan, Luis Andrew Sunga Madridijo, Ellexandrei Esponilla, Zachary Mitchell Francisco

Abstract

Neural Machine Translation (NMT) performance degrades significantly in ultra-low resource settings, particularly for endangeredlanguages like Tao (Yami) which lack extensive parallel corpora. This study investigates strategies to bootstrap a Tao-Tagalog translation system using the NLLB-200 (600 million parameter) model under extremely limited supervision. We propose a multi-faceted approach combining domain-specific fine-tuning, synthetic data augmentation, and cross-lingual transfer learning. Specifically, we leverage the phylogenetic proximity of Ivatan, a related Batanic language, to pre-train the model, and utilize dictionary-based generation to construct synthetic conversational data. Our results demonstrate that transfer learning from Ivatan improves translation quality on in-domain religious texts, achieving a BLEU score of 34.85. Conversely, incorporating synthetic data enhances the model’s ability to generalize to conversational contexts, mitigating the domain bias often inherent in religious corpora. These findings highlight the effectiveness of exploiting linguistic typology and structured lexical resources to develop functional NMT systems for under-represented Austronesian languages.

Anthology ID:: 2026.loresmt-1.2
Volume:: Proceedings for the Ninth Workshop on Technologies for Machine Translation of Low Resource Languages (LoResMT 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jonathan Washington, Nathaniel Oco, Xiaobing Zhao
Venues:: LoResMT | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27–36
Language:
URL:: https://aclanthology.org/2026.loresmt-1.2/
DOI:
Bibkey:
Cite (ACL):: Adrian Denzel Macayan, Luis Andrew Sunga Madridijo, Ellexandrei Esponilla, and Zachary Mitchell Francisco. 2026. Tao–Filipino Neural Machine Translation: Strategies for Ultra–Low-Resource Settings. In Proceedings for the Ninth Workshop on Technologies for Machine Translation of Low Resource Languages (LoResMT 2026), pages 27–36, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Tao–Filipino Neural Machine Translation: Strategies for Ultra–Low-Resource Settings (Macayan et al., LoResMT 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.loresmt-1.2.pdf

PDF Cite Search Fix data