Leveraging Alignment and Phonology for low-resource Indic to English Neural Machine Transliteration

Parth Patel, Manthan Mehta, Pushpak Bhattacharya, Arjun Atreya


Abstract
In this paper we present a novel transliteration technique based on Orthographic Syllable(OS) segmentation for low-resource Indian languages (ILs). Given that alignment has produced promising results in Statistical Machine Transliteration systems and phonology plays an important role in transliteration, we introduce a new model which uses alignment representation similar to that of IBM model 3 to pre-process the tokenized input sequence and then use pre-trained source and target OS-embeddings for training. We apply our model for transliteration from ILs to English and report our accuracy based on Top-1 Exact Match. We also compare our accuracy with a previously proposed Phrase-Based model and report improvements.
Anthology ID:
2020.icon-main.51
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2020
Address:
Indian Institute of Technology Patna, Patna, India
Editors:
Pushpak Bhattacharyya, Dipti Misra Sharma, Rajeev Sangal
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
373–378
Language:
URL:
https://aclanthology.org/2020.icon-main.51
DOI:
Bibkey:
Cite (ACL):
Parth Patel, Manthan Mehta, Pushpak Bhattacharya, and Arjun Atreya. 2020. Leveraging Alignment and Phonology for low-resource Indic to English Neural Machine Transliteration. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pages 373–378, Indian Institute of Technology Patna, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
Leveraging Alignment and Phonology for low-resource Indic to English Neural Machine Transliteration (Patel et al., ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-main.51.pdf
Optional supplementary material:
 2020.icon-main.51.OptionalSupplementaryMaterial.zip