Abdulmalik Danlami Mohammed
2026
Anchoring the Judge: Curriculum-Based Adaptation and Reference-Anchored MQM for LLM-Based Machine Translation of an Unseen Low-Resource Language - A Case of Nupe
Umar Baba Umar | Sulaimon Adebayo Bashir | Abdulmalik Danlami Mohammed
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Umar Baba Umar | Sulaimon Adebayo Bashir | Abdulmalik Danlami Mohammed
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Adapting large language models (LLMs) for machine translation has shown strong performance in low-resource languages; however, their effectiveness for unseen, extremely low-resource languages remains largely unexplored. We present NupeMT-QLoRA, a curriculum-based adaptation framework for the Nupe–English language pair. Our approach employs a two-stage QLoRA fine-tuning strategy: (i) initial training on 34k noisy parallel sentence pairs, followed by (ii) continued fine-tuning on a smaller, cleaner set of 12k bidirectional parallel sentences with explicit translation-direction tags. This staged curriculum stabilizes optimization and improves robustness under severe data scarcity.We further identify a reliability crisis in existing automatic evaluation metrics for unseen languages. Popular LLM-based judges such as GEMBA and xCOMET exhibit weak correlation with human judgments (Kendall’s 𝜏 ≈ 0.21) and low inter-rater reliability (Fleiss’ 𝜅 ≈ 0.27), largely due to fluency bias. To address this, we propose Ref-Anchor-MQM, a reference-anchored evaluation protocol that forces the judge to extract Key Semantic Units from a human reference before scoring.Experimental results show that NupeMT-QLoRA substantially outperforms NLLB-200, improving chrF++ from 22.73 to 41.10, while Ref-Anchor-MQM achieves significantly higher alignment with human evaluation (𝜏 = 0.71). Our framework provides a scalable pipeline for adapting and evaluating LLMs on languages with zero prior representation.