Kishorjit Nongmeikapam

2023

pdf bib abs
Impacts of Approaches for Agglutinative-LRL Neural Machine Translation (NMT): A Case Study on Manipuri-English Pair
Gourashyam Moirangthem | Lavinia Nongbri | Samarendra Singh Salam | Kishorjit Nongmeikapam
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Neural Machine Translation (NMT) is known to be extremely challenging for Low-Resource Languages (LRL) with complex morphology. This work deals with the NMT of a specific LRL called Manipuri/Meeteilon, which is a highly agglutinative language where words have extensive suffixation with limited prefixation. The work studies and discusses the impacts of approaches to mitigate the issues of NMT involving agglutinative LRL in a strictly low-resource setting. The research work experimented with several methods and techniques including subword tokenization, tuning of the selfattention-based NMT model, utilization of monolingual corpus by iterative backtranslation, embedding-based sentence filtering for back translation. This research work in the strictly low resource setting of only 21204 training sentences showed remarkable results with a BLEU score of 28.17 for Manipuri to English translation.

pdf bib abs
Bidirectional Neural Machine Translation (NMT) using Monolingual Data for Khasi-English Pair
Lavinia Nongbri | Gourashyam Moirangthem | Samarendra Salam | Kishorjit Nongmeikapam
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Due to a lack of parallel data, low-resource language machine translation has been unable to make the most of Neural Machine Translation. This paper investigates several approaches as to how low-resource Neural Machine Translation can be improved in a strictly low-resource setting, especially for bidirectional Khasi-English language pairs. The back-translation method is used to expand the parallel corpus using monolingual data. The work also experimented with subword tokenizers to improve the translation accuracy for new and rare words. Transformer, a cutting-edge NMT model, serves as the backbone of the bidirectional Khasi-English machine translation. The final Khasi-to-English and English-to-Khasi NMT models trained using both authentic and synthetic parallel corpora show an increase of 2.34 and 3.1 BLEU scores, respectively, when compared to the models trained using only authentic parallel dataset.

2020

pdf bib abs
Deep Neural Model for Manipuri Multiword Named Entity Recognition with Unsupervised Cluster Feature
Jimmy Laishram | Kishorjit Nongmeikapam | Sudip Naskar
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

The recognition task of Multi-Word Named Entities (MNEs) in itself is a challenging task when the language is inflectional and agglutinative. Having breakthrough NLP researches with deep neural network and language modelling techniques, the applicability of such techniques/algorithms for Indian language like Manipuri remains unanswered. In this paper an attempt to recognize Manipuri MNE is performed using a Long Short Term Memory (LSTM) recurrent neural network model in conjunction with Part Of Speech (POS) embeddings. To further improve the classification accuracy, word cluster information using K-means clustering approach is added as a feature embedding. The cluster information is generated using a Skip-gram based words vector that contains the semantic and syntactic information of each word. The model so proposed does not use extensive language morphological features to elevate its accuracy. Finally the model’s performance is compared with the other machine learning based Manipuri MNE models.