MTNLP-IIITH: Machine Translation for Low-Resource Indic Languages

Abhinav P M, Ketaki Shetye, Parameswari Krishnamurthy


Abstract
Machine Translation for low-resource languages presents significant challenges, primarily due to limited data availability. We have a baseline model and a primary model. For the baseline model, we first fine-tune the mBART model (mbart-large-50-many-to-many-mmt) for the language pairs English-Khasi, Khasi-English, English-Manipuri, and Manipuri-English. We then augment the dataset by back-translating from Indic languages to English. To enhance data quality, we fine-tune the LaBSE model specifically for Khasi and Manipuri, generating sentence embeddings and applying a cosine similarity threshold of 0.84 to filter out low-quality back-translations. The filtered data is combined with the original training data and used to further fine-tune the mBART model, creating our primary model. The results show that the primary model slightly outperforms the baseline model, with the best performance achieved by the English-to-Khasi (en-kh) primary model, which recorded a BLEU score of 0.0492, a chrF score of 0.3316, and a METEOR score of 0.2589 (on a scale of 0 to 1), with similar results for other language pairs.
Anthology ID:
2024.wmt-1.65
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
751–755
Language:
URL:
https://aclanthology.org/2024.wmt-1.65
DOI:
Bibkey:
Cite (ACL):
Abhinav P M, Ketaki Shetye, and Parameswari Krishnamurthy. 2024. MTNLP-IIITH: Machine Translation for Low-Resource Indic Languages. In Proceedings of the Ninth Conference on Machine Translation, pages 751–755, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
MTNLP-IIITH: Machine Translation for Low-Resource Indic Languages (P M et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.65.pdf