NIT Rourkela Machine Translation(MT) System Submission to WAT 2022 for MultiIndicMT: An Indic Language Multilingual Shared Task

Sudhansu Bala Das; Atharv Biradar; Tapas Kumar Mishra; Bidyut Kumar Patra

NIT Rourkela Machine Translation(MT) System Submission to WAT 2022 for MultiIndicMT: An Indic Language Multilingual Shared Task

Sudhansu Bala Das, Atharv Biradar, Tapas Kumar Mishra, Bidyut Kumar Patra

Abstract

Multilingual Neural Machine Translation (MNMT) exhibits incredible performance with the development of a single translation model for many languages. Previous studies on multilingual translation reveal that multilingual training is effective for languages with limited corpus. This paper presents our submission (Team Id: NITR) in the WAT 2022 for “MultiIndicMT shared task” where the objective of the task is the translation between 5 Indic languages from OPUS Corpus (which are newly added in WAT 2022 corpus) into English and vice versa using the corpus provided by the organizer of WAT. Our system is based on a transformer-based NMT using fairseq modelling toolkit with ensemble techniques. Heuristic pre-processing approaches are carried out before keeping the model under training. Our multilingual NMT systems are trained with shared encoder and decoder parameters followed by assigning language embeddings to each token in both encoder and decoder. Our final multilingual system was examined by using BLEU and RIBES metrics scores. In future, we look forward to extend our research that will help in fine-tuning of both encoder and decoder during the monolingual unsupervised training in order to improve the quality of the synthetic data generated during the process.

Anthology ID:: 2022.wat-1.8
Volume:: Proceedings of the 9th Workshop on Asian Translation
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Venue:: WAT
SIG:
Publisher:: International Conference on Computational Linguistics
Note:
Pages:: 73–77
Language:
URL:: https://aclanthology.org/2022.wat-1.8/
DOI:
Bibkey:
Cite (ACL):: Sudhansu Bala Das, Atharv Biradar, Tapas Kumar Mishra, and Bidyut Kumar Patra. 2022. NIT Rourkela Machine Translation(MT) System Submission to WAT 2022 for MultiIndicMT: An Indic Language Multilingual Shared Task. In Proceedings of the 9th Workshop on Asian Translation, pages 73–77, Gyeongju, Republic of Korea. International Conference on Computational Linguistics.
Cite (Informal):: NIT Rourkela Machine Translation(MT) System Submission to WAT 2022 for MultiIndicMT: An Indic Language Multilingual Shared Task (Das et al., WAT 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.wat-1.8.pdf

PDF Cite Search Fix data