Manthan Mehta


2020

pdf bib
Leveraging Alignment and Phonology for low-resource Indic to English Neural Machine Transliteration
Parth Patel | Manthan Mehta | Pushpak Bhattacharya | Arjun Atreya
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

In this paper we present a novel transliteration technique based on Orthographic Syllable(OS) segmentation for low-resource Indian languages (ILs). Given that alignment has produced promising results in Statistical Machine Transliteration systems and phonology plays an important role in transliteration, we introduce a new model which uses alignment representation similar to that of IBM model 3 to pre-process the tokenized input sequence and then use pre-trained source and target OS-embeddings for training. We apply our model for transliteration from ILs to English and report our accuracy based on Top-1 Exact Match. We also compare our accuracy with a previously proposed Phrase-Based model and report improvements.