NLIP-Lab-IITH Multilingual MT System for WAT24 MT Shared Task

Maharaj Brahma, Pramit Sahoo, Maunendra Sankar Desarkar


Abstract
This paper describes NLIP Lab’s multilingual machine translation system for the WAT24 shared task on multilingual Indic MT task for 22 scheduled languages belonging to 4 language families. We explore pre-training for Indic languages using alignment agreement objectives. We utilize bi-lingual dictionaries to substitute words from source sentences. Furthermore, we fine-tuned language direction-specific multilingual translation models using small and high-quality seed data. Our primary submission is a 243M parameters multilingual translation model covering 22 Indic languages. In the IN22-Gen benchmark, we achieved an average chrF++ score of 46.80 and 18.19 BLEU score for the En-Indic direction. In the Indic-En direction, we achieved an average chrF++ score of 56.34 and 30.82 BLEU score. In the In22-Conv benchmark, we achieved an average chrF++ score of 43.43 and BLEU score of 16.58 in the En-Indic direction, and in the Indic-En direction, we achieved an average of 52.44 and 29.77 for chrF++ and BLEU respectively. Our model is competitive with IndicTransv1 (474M parameter model).
Anthology ID:
2024.wmt-1.74
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
804–809
Language:
URL:
https://aclanthology.org/2024.wmt-1.74
DOI:
Bibkey:
Cite (ACL):
Maharaj Brahma, Pramit Sahoo, and Maunendra Sankar Desarkar. 2024. NLIP-Lab-IITH Multilingual MT System for WAT24 MT Shared Task. In Proceedings of the Ninth Conference on Machine Translation, pages 804–809, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
NLIP-Lab-IITH Multilingual MT System for WAT24 MT Shared Task (Brahma et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.74.pdf