Leveraging Alignment and Phonology for low-resource Indic to English Neural Machine Transliteration

Parth Patel, Manthan Mehta, Pushpak Bhattacharya, Arjun Atreya


Abstract
In this paper we present a novel transliteration technique based on Orthographic Syllable(OS) segmentation for low-resource Indian languages (ILs). Given that alignment has produced promising results in Statistical Machine Transliteration systems and phonology plays an important role in transliteration, we introduce a new model which uses alignment representation similar to that of IBM model 3 to pre-process the tokenized input sequence and then use pre-trained source and target OS-embeddings for training. We apply our model for transliteration from ILs to English and report our accuracy based on Top-1 Exact Match. We also compare our accuracy with a previously proposed Phrase-Based model and report improvements.
Anthology ID:
2020.icon-main.51
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2020
Address:
Indian Institute of Technology Patna, Patna, India
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
373–378
Language:
URL:
https://aclanthology.org/2020.icon-main.51
DOI:
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-main.51.pdf
Optional supplementary material:
 2020.icon-main.51.OptionalSupplementaryMaterial.zip