Multilingual Neural Machine Translation involving Indian Languages

Pulkit Madaan, Fatiha Sadat


Abstract
Neural Machine Translations (NMT) models are capable of translating a single bilingual pair and require a new model for each new language pair. Multilingual Neural Machine Translation models are capable of translating multiple language pairs, even pairs which it hasn’t seen before in training. Availability of parallel sentences is a known problem in machine translation. Multilingual NMT model leverages information from all the languages to improve itself and performs better. We propose a data augmentation technique that further improves this model profoundly. The technique helps achieve a jump of more than 15 points in BLEU score from the multilingual NMT model. A BLEU score of 36.2 was achieved for Sindhi–English translation, which is higher than any score on the leaderboard of the LoResMT SharedTask at MT Summit 2019, which provided the data for the experiments.
Anthology ID:
2020.wildre-1.6
Volume:
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Girish Nath Jha, Kalika Bali, Sobha L., S. S. Agrawal, Atul Kr. Ojha
Venue:
WILDRE
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
29–32
Language:
English
URL:
https://aclanthology.org/2020.wildre-1.6
DOI:
Bibkey:
Cite (ACL):
Pulkit Madaan and Fatiha Sadat. 2020. Multilingual Neural Machine Translation involving Indian Languages. In Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation, pages 29–32, Marseille, France. European Language Resources Association (ELRA).
Cite (Informal):
Multilingual Neural Machine Translation involving Indian Languages (Madaan & Sadat, WILDRE 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wildre-1.6.pdf