Unsupervised Machine Translation On Dravidian Languages

Sai Koneru, Danni Liu, Jan Niehues


Abstract
Unsupervised Neural Machine translation (UNMT) is beneficial especially for under-resourced languages such as from the Dravidian family. They learn to translate between the source and target, relying solely on only monolingual corpora. However, UNMT systems fail in scenarios that occur often when dealing with low resource languages. Recent works have achieved state-of-the-art results by adding auxiliary parallel data with similar languages. In this work, we focus on unsupervised translation between English and Kannada by using limited amounts of auxiliary data between English and other Dravidian languages. We show that transliteration is essential in unsupervised translation between Dravidian languages, as they do not share a common writing system. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for dissimilar language pairs. We show from our experiments it is crucial for Kannada and reference languages to be similar. Further, we propose a method to measure language similarity to choose the most beneficial reference languages.
Anthology ID:
2021.dravidianlangtech-1.7
Volume:
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Month:
April
Year:
2021
Address:
Kyiv
Editors:
Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Parameswari Krishnamurthy, Elizabeth Sherly
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
55–64
Language:
URL:
https://aclanthology.org/2021.dravidianlangtech-1.7
DOI:
Bibkey:
Cite (ACL):
Sai Koneru, Danni Liu, and Jan Niehues. 2021. Unsupervised Machine Translation On Dravidian Languages. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 55–64, Kyiv. Association for Computational Linguistics.
Cite (Informal):
Unsupervised Machine Translation On Dravidian Languages (Koneru et al., DravidianLangTech 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.dravidianlangtech-1.7.pdf