Multilingual Multi-Domain NMT for Indian Languages

Sourav Kumar; Salil Aggarwal; Dipti Misra Sharma

Multilingual Multi-Domain NMT for Indian Languages

Sourav Kumar, Salil Aggarwal, Dipti Sharma

Abstract

India is known as the land of many tongues and dialects. Neural machine translation (NMT) is the current state-of-the-art approach for machine translation (MT) but performs better only with large datasets which Indian languages usually lack, making this approach infeasible. So, in this paper, we address the problem of data scarcity by efficiently training multilingual and multilingual multi domain NMT systems involving languages of the 𝐈𝐧𝐝𝐢𝐚𝐧 𝐬𝐮𝐛𝐜𝐨𝐧𝐭𝐢𝐧𝐞𝐧𝐭. We are proposing the technique for using the joint domain and language tags in a multilingual setup. We draw three major conclusions from our experiments: (i) Training a multilingual system via exploiting lexical similarity based on language family helps in achieving an overall average improvement of 𝟑.𝟐𝟓 𝐁𝐋𝐄𝐔 𝐩𝐨𝐢𝐧𝐭𝐬 over bilingual baselines, (ii) Technique of incorporating domain information into the language tokens helps multilingual multi-domain system in getting a significant average improvement of 𝟔 𝐁𝐋𝐄𝐔 𝐩𝐨𝐢𝐧𝐭𝐬 over the baselines, (iii) Multistage fine-tuning further helps in getting an improvement of 𝟏-𝟏.𝟓 𝐁𝐋𝐄𝐔 𝐩𝐨𝐢𝐧𝐭𝐬 for the language pair of interest.

Anthology ID:: 2021.ranlp-1.83
Volume:: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:: September
Year:: 2021
Address:: Held Online
Editors:: Ruslan Mitkov, Galia Angelova
Venue:: RANLP
SIG:
Publisher:: INCOMA Ltd.
Note:
Pages:: 727–733
Language:
URL:: https://aclanthology.org/2021.ranlp-1.83/
DOI:
Bibkey:
Cite (ACL):: Sourav Kumar, Salil Aggarwal, and Dipti Sharma. 2021. Multilingual Multi-Domain NMT for Indian Languages. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 727–733, Held Online. INCOMA Ltd..
Cite (Informal):: Multilingual Multi-Domain NMT for Indian Languages (Kumar et al., RANLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.ranlp-1.83.pdf

PDF Cite Search Fix data