Pramit Sahoo


2024

pdf bib
NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task
Pramit Sahoo | Maharaj Brahma | Maunendra Sankar Desarkar
Proceedings of the Ninth Conference on Machine Translation

In this paper, we describe our system for the WMT 24 shared task of Low-Resource Indic Language Translation. We consider eng↔{as, kha, lus, mni} as participating language pairs. In this shared task, we explore the fine-tuning of a pre-trained model motivated by the pre-trained objective of aligning embeddings closer by alignment augmentation (Lin et al.,2020) for 22 scheduled Indian languages. Our primary system is based on language-specific finetuning on a pre-trained model. We achieve chrF2 scores of 50.6, 42.3, 54.9, and 66.3 on the official public test set for eng→as, eng→kha, eng→lus, eng→mni respectively. We also explore multilingual training with/without language grouping and layer-freezing.

pdf bib
NLIP-Lab-IITH Multilingual MT System for WAT24 MT Shared Task
Maharaj Brahma | Pramit Sahoo | Maunendra Sankar Desarkar
Proceedings of the Ninth Conference on Machine Translation

This paper describes NLIP Lab’s multilingual machine translation system for the WAT24 shared task on multilingual Indic MT task for 22 scheduled languages belonging to 4 language families. We explore pre-training for Indic languages using alignment agreement objectives. We utilize bi-lingual dictionaries to substitute words from source sentences. Furthermore, we fine-tuned language direction-specific multilingual translation models using small and high-quality seed data. Our primary submission is a 243M parameters multilingual translation model covering 22 Indic languages. In the IN22-Gen benchmark, we achieved an average chrF++ score of 46.80 and 18.19 BLEU score for the En-Indic direction. In the Indic-En direction, we achieved an average chrF++ score of 56.34 and 30.82 BLEU score. In the In22-Conv benchmark, we achieved an average chrF++ score of 43.43 and BLEU score of 16.58 in the En-Indic direction, and in the Indic-En direction, we achieved an average of 52.44 and 29.77 for chrF++ and BLEU respectively. Our model is competitive with IndicTransv1 (474M parameter model).