A Translation-Based Approach to Morphology Learning for Low Resource Languages

Tewodros Gebreselassie, Amanuel Mersha, Michael Gasser


Abstract
“Low resource languages” usually refers to languages that lack corpora and basic tools such as part-of-speech taggers. But a significant number of such languages do benefit from the availability of relatively complex linguistic descriptions of phonology, morphology, and syntax, as well as dictionaries. A further category, probably the majority of the world’s languages, suffers from the lack of even these resources. In this paper, we investigate the possibility of learning the morphology of such a language by relying on its close relationship to a language with more resources. Specifically, we use a transfer-based approach to learn the morphology of the severely under-resourced language Gofa, starting with a neural morphological generator for the closely related language, Wolaytta. Both languages are members of the Omotic family, spoken and southwestern Ethiopia, and, like other Omotic languages, both are morphologically complex. We first create a finite- state transducer for morphological analysis and generation for Wolaytta, based on relatively complete linguistic descriptions and lexicons for the language. Next, we train an encoder-decoder neural network on the task of morphological generation for Wolaytta, using data generated by the FST. Such a network takes a root and a set of grammatical features as input and generates a word form as output. We then elicit Gofa translations of a small set of Wolaytta words from bilingual speakers. Finally, we retrain the decoder of the Wolaytta network, using a small set of Gofa target words that are translations of the Wolaytta outputs of the original network. The evaluation shows that the transfer network performs better than a separate encoder-decoder network trained on a larger set of Gofa words. We conclude with implications for the learning of morphology for severely under-resourced languages in regions where there are related languages with more resources.
Anthology ID:
2020.winlp-1.10
Volume:
Proceedings of the Fourth Widening Natural Language Processing Workshop
Month:
July
Year:
2020
Address:
Seattle, USA
Editors:
Rossana Cunha, Samira Shaikh, Erika Varis, Ryan Georgi, Alicia Tsai, Antonios Anastasopoulos, Khyathi Raghavi Chandu
Venue:
WiNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36–40
Language:
URL:
https://aclanthology.org/2020.winlp-1.10
DOI:
10.18653/v1/2020.winlp-1.10
Bibkey:
Cite (ACL):
Tewodros Gebreselassie, Amanuel Mersha, and Michael Gasser. 2020. A Translation-Based Approach to Morphology Learning for Low Resource Languages. In Proceedings of the Fourth Widening Natural Language Processing Workshop, pages 36–40, Seattle, USA. Association for Computational Linguistics.
Cite (Informal):
A Translation-Based Approach to Morphology Learning for Low Resource Languages (Gebreselassie et al., WiNLP 2020)
Copy Citation:
Video:
 http://slideslive.com/38929546