NICT-AI4B’s Submission to the Indic MT Shared Task in WMT 2023

Raj Dabre, Jay Gala, Pranjal Chitale


Abstract
In this paper, we (Team NICT-AI4B) describe our MT systems that we submit to the Indic MT task in WMT 2023. Our primary system consists of 3 stages: Joint denoising and MT training using officially approved monolingual and parallel corpora, backtranslation and, MT training on original and backtranslated parallel corpora. We observe that backtranslation leads to substantial improvements in translation quality up to 4 BLEU points. We also develop 2 contrastive systems on unconstrained settings, where the first system involves fine-tuning of IndicTrans2 DA models on official parallel corpora and seed data used in AI4Bharat et al, (2023), and the second system involves a system combination of the primary and the aforementioned system. Overall, we manage to obtain high-quality translation systems for the 4 low-resource North-East Indian languages of focus.
Anthology ID:
2023.wmt-1.88
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
941–949
Language:
URL:
https://aclanthology.org/2023.wmt-1.88
DOI:
10.18653/v1/2023.wmt-1.88
Bibkey:
Cite (ACL):
Raj Dabre, Jay Gala, and Pranjal Chitale. 2023. NICT-AI4B’s Submission to the Indic MT Shared Task in WMT 2023. In Proceedings of the Eighth Conference on Machine Translation, pages 941–949, Singapore. Association for Computational Linguistics.
Cite (Informal):
NICT-AI4B’s Submission to the Indic MT Shared Task in WMT 2023 (Dabre et al., WMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.wmt-1.88.pdf