IIIT Hyderabad Submission To WAT 2021: Efficient Multilingual NMT systems for Indian languages

Sourav Kumar, Salil Aggarwal, Dipti Sharma


Abstract
This paper describes the work and the systems submitted by the IIIT-Hyderbad team in the WAT 2021 MultiIndicMT shared task. The task covers 10 major languages of the Indian subcontinent. For the scope of this task, we have built multilingual systems for 20 translation directions namely English-Indic (one-to- many) and Indic-English (many-to-one). Individually, Indian languages are resource poor which hampers translation quality but by leveraging multilingualism and abundant monolingual corpora, the translation quality can be substantially boosted. But the multilingual systems are highly complex in terms of time as well as computational resources. Therefore, we are training our systems by efficiently se- lecting data that will actually contribute to most of the learning process. Furthermore, we are also exploiting the language related- ness found in between Indian languages. All the comparisons were made using BLEU score and we found that our final multilingual sys- tem significantly outperforms the baselines by an average of 11.3 and 19.6 BLEU points for English-Indic (en-xx) and Indic-English (xx- en) directions, respectively.
Anthology ID:
2021.wat-1.25
Volume:
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
Month:
August
Year:
2021
Address:
Online
Editors:
Toshiaki Nakazawa, Hideki Nakayama, Isao Goto, Hideya Mino, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Shohei Higashiyama, Hiroshi Manabe, Win Pa Pa, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Katsuhito Sudoh, Sadao Kurohashi, Pushpak Bhattacharyya
Venue:
WAT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
212–216
Language:
URL:
https://aclanthology.org/2021.wat-1.25
DOI:
10.18653/v1/2021.wat-1.25
Bibkey:
Cite (ACL):
Sourav Kumar, Salil Aggarwal, and Dipti Sharma. 2021. IIIT Hyderabad Submission To WAT 2021: Efficient Multilingual NMT systems for Indian languages. In Proceedings of the 8th Workshop on Asian Translation (WAT2021), pages 212–216, Online. Association for Computational Linguistics.
Cite (Informal):
IIIT Hyderabad Submission To WAT 2021: Efficient Multilingual NMT systems for Indian languages (Kumar et al., WAT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wat-1.25.pdf