Aditya Raghuwanshi
2024
SRIB-NMT’s Submission to the Indic MT Shared Task in WMT 2024
Pranamya Patil
|
Raghavendra Hr
|
Aditya Raghuwanshi
|
Kushal Verma
Proceedings of the Ninth Conference on Machine Translation
In the context of the Indic Low Resource Ma-chine Translation (MT) challenge at WMT-24, we participated in four language pairs:English-Assamese (en-as), English-Mizo (en-mz), English-Khasi (en-kh), and English-Manipuri (en-mn). To address these tasks,we employed a transformer-based sequence-to-sequence architecture (Vaswani et al., 2017).In the PRIMARY system, which did not uti-lize external data, we first pretrained languagemodels (low resource languages) using avail-able monolingual data before finetuning themon small parallel datasets for translation. Forthe CONTRASTIVE submission approach, weutilized pretrained translation models like In-dic Trans2 (Gala et al., 2023) and appliedLoRA Fine-tuning (Hu et al., 2021) to adaptthem to smaller, low-resource languages, aim-ing to leverage cross-lingual language transfercapabilities (CONNEAU and Lample, 2019).These approaches resulted in significant im-provements in SacreBLEU scores(Post, 2018)for low-resource languages.