Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican

Nathaniel Robinson, Cameron Hogan, Nancy Fulda, David R. Mortensen


Abstract
Multilingual transfer techniques often improve low-resource machine translation (MT). Many of these techniques are applied without considering data characteristics. We show in the context of Haitian-to-English translation that transfer effectiveness is correlated with amount of training data and relationships between knowledge-sharing languages. Our experiments suggest that for some languages beyond a threshold of authentic data, back-translation augmentation methods are counterproductive, while cross-lingual transfer from a sufficiently related language is preferred. We complement this finding by contributing a rule-based French-Haitian orthographic and syntactic engine and a novel method for phonological embedding. When used with multilingual techniques, orthographic transformation makes statistically significant improvements over conventional methods. And in very low-resource Jamaican MT, code-switching with a transfer language for orthographic resemblance yields a 6.63 BLEU point advantage.
Anthology ID:
2022.loresmt-1.5
Volume:
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Atul Kr. Ojha, Chao-Hong Liu, Ekaterina Vylomova, Jade Abbott, Jonathan Washington, Nathaniel Oco, Tommi A Pirinen, Valentin Malykh, Varvara Logacheva, Xiaobing Zhao
Venue:
LoResMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35–42
Language:
URL:
https://aclanthology.org/2022.loresmt-1.5
DOI:
Bibkey:
Cite (ACL):
Nathaniel Robinson, Cameron Hogan, Nancy Fulda, and David R. Mortensen. 2022. Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican. In Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022), pages 35–42, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):
Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican (Robinson et al., LoResMT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.loresmt-1.5.pdf