Navanath Saharia


2025

This paper describe DELAB-IIITM’s submission system for the WMT25 machine translation shared task. We participated in two sub-task of the Indic Translation Task, en↔as and en↔mn i.e. Assamese (Indo Aryan language) and Manipuri (Tibeto Burman language) with a total of six translation directions, including mn→en, mn←en, en→as, en←as, mn→as, mn←as. Our fine tuning process aims to leverages the pretrained multilingual NLLB-200 model, a machine translation model developed by Meta AI as part of the No Language Left Behind (NLLB) project, through two main development, Synthetic parallel corpus creation and Strategic Fine-tuning. The Fine-tuning process involves strict data cleaning protocols, Adafactor optimizer with low learning rate(2e-5), 2 training epochs, train-test data splits to prevent overfitting, and Seq2SeqTrainer framework. The official test data was used to generate the target language with our fine-tuned model. Experimental results show that our method improves the BLEU scores for translation of these two language pairs. These findings confirm that back-translation remains challenging, largely due to morphological complexity and limited data availability.

2021

This paper presents our system description on participation in ICON-2021 Shared Task sub-task 1 on multilingual gender-biased and communal language identification as team name: DELab@IIITSM. We have participated in two language-specific Meitei, Hindi, and one multi-lingualMeitei, Hindi, and Bangla with English code-mixed languages identification task. Our method includes well design pre-processing phase based on the dataset, the frequency-based feature extraction technique TF-IDF which creates the feature vector for each instance using(Decision Tree). We obtained weights are 0.629, 0.625, and 0.632 as the overall micro F1 score for the Hindi, Meitei, and multilingual datasets.

2012

2009