Sahar Hooshmand
2025
Advancing Clinical Translation in Nepali through Fine-Tuned Multilingual Models
Benyamin Ahmadnia
|
Sumaiya Shaikh
|
Bibek Poudel
|
Shazan Mohammed
|
Sahar Hooshmand
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Low-resource Neural Machine Translation (NMT) remains a major challenge, particularly in high-stakes domains such as healthcare. This paper presents a domain-adapted pipeline for English-Nepali medical translation leveraging two state-of-the-art multilingual Large Language Models (LLMs): mBART and NLLB-200. A high-quality, domain-specific parallel corpus is curated, and both models are fine-tuned using PyTorch frameworks. Translation fidelity is assessed through a multi-metric evaluation strategy that combines BLEU, CHRF++, METEOR, BERTScore, COMET, and perplexity. Our experimental results show that NLLB-200 consistently outperforms mBART across surface-level and semantic metrics, achieving higher accuracy and lower hallucination rates in clinical settings. In addition, error profiling and ethical assessments are conducted to highlight challenges such as term omissions and cultural bias. This work underscores the viability of large-scale multilingual models in enhancing medical translation for low-resource languages and proposes actionable paths toward safer and more equitable MT deployment in healthcare.