JU-NLP: Improving Low-Resource Indic Translation System with Efficient LoRA-Based Adaptation

Priyobroto Acharya, Haranath Mondal, Dipanjan Saha, Dipankar Das, Sivaji Bandyopadhyay


Abstract
Low-resource Indic languages such as Assamese, Manipuri, Mizo, and Bodo face persistent challenges in NMT due to limited parallel data, diverse scripts, and complex morphology. We address these issues in the WMT $2025$ shared task by introducing a unified multilingual NMT framework that combines rigorous language-specific preprocessing with parameter-efficient adaptation of large-scale models. Our pipeline integrates the NLLB-$200$ and IndicTrans$2$ architectures, fine-tuned using LoRA and DoRA, reducing trainable parameters by over 90% without degrading translation quality. A comprehensive preprocessing suite, including Unicode normalization, semantic filtering, transliteration, and noise reduction, ensures high-quality inputs, while script-aware post-processing mitigates evaluation bias from orthographic mismatches. Experiments across English-Indic directions demonstrate that NLLB-$200$ achieves superior results for Assamese, Manipuri, and Mizo, whereas IndicTrans$2$ excels in English-Bodo. Evaluated using BLEU, chrF, METEOR, ROUGE-L, and TER, our approach yields consistent improvements over baselines, underscoring the effectiveness of combining efficient fine-tuning with linguistically informed preprocessing for low-resource Indic MT.
Anthology ID:
2025.wmt-1.95
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1201–1209
Language:
URL:
https://aclanthology.org/2025.wmt-1.95/
DOI:
Bibkey:
Cite (ACL):
Priyobroto Acharya, Haranath Mondal, Dipanjan Saha, Dipankar Das, and Sivaji Bandyopadhyay. 2025. JU-NLP: Improving Low-Resource Indic Translation System with Efficient LoRA-Based Adaptation. In Proceedings of the Tenth Conference on Machine Translation, pages 1201–1209, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
JU-NLP: Improving Low-Resource Indic Translation System with Efficient LoRA-Based Adaptation (Acharya et al., WMT 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.wmt-1.95.pdf