Samarendra Singh Salam


2023

pdf bib
Impacts of Approaches for Agglutinative-LRL Neural Machine Translation (NMT): A Case Study on Manipuri-English Pair
Gourashyam Moirangthem | Lavinia Nongbri | Samarendra Singh Salam | Kishorjit Nongmeikapam
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Neural Machine Translation (NMT) is known to be extremely challenging for Low-Resource Languages (LRL) with complex morphology. This work deals with the NMT of a specific LRL called Manipuri/Meeteilon, which is a highly agglutinative language where words have extensive suffixation with limited prefixation. The work studies and discusses the impacts of approaches to mitigate the issues of NMT involving agglutinative LRL in a strictly low-resource setting. The research work experimented with several methods and techniques including subword tokenization, tuning of the selfattention-based NMT model, utilization of monolingual corpus by iterative backtranslation, embedding-based sentence filtering for back translation. This research work in the strictly low resource setting of only 21204 training sentences showed remarkable results with a BLEU score of 28.17 for Manipuri to English translation.