NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task

Pramit Sahoo, Maharaj Brahma, Maunendra Sankar Desarkar


Abstract
In this paper, we describe our system for the WMT 24 shared task of Low-Resource Indic Language Translation. We consider eng↔{as, kha, lus, mni} as participating language pairs. In this shared task, we explore the fine-tuning of a pre-trained model motivated by the pre-trained objective of aligning embeddings closer by alignment augmentation (Lin et al.,2020) for 22 scheduled Indian languages. Our primary system is based on language-specific finetuning on a pre-trained model. We achieve chrF2 scores of 50.6, 42.3, 54.9, and 66.3 on the official public test set for eng→as, eng→kha, eng→lus, eng→mni respectively. We also explore multilingual training with/without language grouping and layer-freezing.
Anthology ID:
2024.wmt-1.70
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
781–787
Language:
URL:
https://aclanthology.org/2024.wmt-1.70
DOI:
Bibkey:
Cite (ACL):
Pramit Sahoo, Maharaj Brahma, and Maunendra Sankar Desarkar. 2024. NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task. In Proceedings of the Ninth Conference on Machine Translation, pages 781–787, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task (Sahoo et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.70.pdf