The SETU-ADAPT Submissions to the WMT24 Low-Resource Indic Language Translation Task

Neha Gajakos, Prashanth Nayak, Rejwanul Haque, Andy Way


Abstract
This paper presents the SETU-ADAPT’s submissions to the WMT 2024 Low-Resource Indic Language Translation task. We participated in the unconstrained segment of the task, focusing on the Assamese-to-English and English-to-Assamese language pairs. Our approach involves leveraging Large Language Models (LLMs) as the baseline systems for all our MT tasks. Furthermore, we applied various strategies to improve the baseline systems. In our first approach, we fine-tuned LLMs using all the data provided by the task organisers. Our second approach explores in-context learning by focusing on few-shot prompting. In our final approach we explore an efficient data extraction technique based on a fuzzy match-based similarity measure for fine-tuning. We evaluated our systems using BLEU, chrF, WER, and COMET. The experimental results showed that our strategies can effectively improve the quality of translations in low-resource scenarios.
Anthology ID:
2024.wmt-1.67
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
762–769
Language:
URL:
https://aclanthology.org/2024.wmt-1.67
DOI:
Bibkey:
Cite (ACL):
Neha Gajakos, Prashanth Nayak, Rejwanul Haque, and Andy Way. 2024. The SETU-ADAPT Submissions to the WMT24 Low-Resource Indic Language Translation Task. In Proceedings of the Ninth Conference on Machine Translation, pages 762–769, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
The SETU-ADAPT Submissions to the WMT24 Low-Resource Indic Language Translation Task (Gajakos et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.67.pdf