Study on Unsupervised Statistical Machine Translation for Backtranslation

Anush Kumar, Nihal V. Nayak, Aditya Chandra, Mydhili K. Nair


Abstract
Machine Translation systems have drastically improved over the years for several language pairs. Monolingual data is often used to generate synthetic sentences to augment the training data which has shown to improve the performance of machine translation models. In our paper, we make use of an Unsupervised Statistical Machine Translation (USMT) to generate synthetic sentences. Our study compares the performance improvements in Neural Machine Translation model when using synthetic sentences from supervised and unsupervised Machine Translation models. Our approach of using USMT for backtranslation shows promise in low resource conditions and achieves an improvement of 3.2 BLEU score over the Neural Machine Translation model.
Anthology ID:
R19-1068
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
578–582
Language:
URL:
https://aclanthology.org/R19-1068
DOI:
10.26615/978-954-452-056-4_068
Bibkey:
Cite (ACL):
Anush Kumar, Nihal V. Nayak, Aditya Chandra, and Mydhili K. Nair. 2019. Study on Unsupervised Statistical Machine Translation for Backtranslation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 578–582, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Study on Unsupervised Statistical Machine Translation for Backtranslation (Kumar et al., RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1068.pdf