Singh Thoudam Doren


2023

pdf bib
Sentiment Analysis for the Mizo Language: A Comparative Study of Classical Machine Learning and Transfer Learning Approaches
Lalthangmawii Mercy | Singh Thoudam Doren
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Sentiment analysis, a subfield of natural language processing (NLP) has witnessed significant advancements in the analysis of usergenerated contents across diverse languages. However, its application to low-resource languages remains a challenge. This research addresses this gap by conducting a comprehensive sentiment analysis experiment in the context of the Mizo language, a low-resource language predominantly spoken in the Indian state of Mizoram and neighboring regions. Our study encompasses the evaluation of various machine learning models including Support Vector Machine (SVM), Decision Tree, Random Forest, K-Nearest Neighbor (K-NN), Logistic Regression and transfer learning using XLM-RoBERTa. The findings reveal the suitability of SVM as a robust performer in Mizo sentiment analysis demonstrating the highest F1 Score and Accuracy among the models tested. XLM-RoBERTa, a transfer learning model exhibits competitive performance highlighting the potential of leveraging pre-trained multilingual models in low-resource language sentiment analysis tasks. This research advances our understanding of sentiment analysis in lowresource languages and serves as a stepping stone for future investigations in this domain.

pdf bib
A comparative study of transformer and transfer learning MT models for English-Manipuri
Singh Kshetrimayum Boynao | Singh Ningthoujam Avichandra | Meetei Loitongbam Sanayai | Singh Ningthoujam Justwant | Singh Thoudam Doren | Bandyopadhyay Sivaji
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

In this work, we focus on the development of machine translation (MT) models of a lowresource language pair viz. English-Manipuri. Manipuri is one of the eight scheduled languages of the Indian constitution. Manipuri is currently written in two different scripts: one is its original script called Meitei Mayek and the other is the Bengali script. We evaluate the performance of English-Manipuri MT models based on transformer and transfer learning technique. Our MT models are trained using a dataset of 69,065 parallel sentences and validated on 500 sentences. Using 500 test sentences, the English to Manipuri MT models achieved a BLEU score of 19.13 and 29.05 with mT5 and OpenNMT respectively. The results demonstrate that the OpenNMT model significantly outperforms the mT5 model. Additionally, Manipuri to English MT system trained with OpenNMT model reported a BLEU score of 30.90. We also carried out a comparative analysis between the Bengali script and the transliterated Meitei Mayek script for English-Manipuri MT models. This analysis reveals that the transliterated version enhances the MT model performance resulting in a notable +2.35 improvement in the BLEU score.