IITP-MT at CALCS2021: English to Hinglish Neural Machine Translation using Unsupervised Synthetic Code-Mixed Parallel Corpus

Ramakrishna Appicharla, Kamal Kumar Gupta, Asif Ekbal, Pushpak Bhattacharyya


Abstract
This paper describes the system submitted by IITP-MT team to Computational Approaches to Linguistic Code-Switching (CALCS 2021) shared task on MT for English→Hinglish. We submit a neural machine translation (NMT) system which is trained on the synthetic code-mixed (cm) English-Hinglish parallel corpus. We propose an approach to create code-mixed parallel corpus from a clean parallel corpus in an unsupervised manner. It is an alignment based approach and we do not use any linguistic resources for explicitly marking any token for code-switching. We also train NMT model on the gold corpus provided by the workshop organizers augmented with the generated synthetic code-mixed parallel corpus. The model trained over the generated synthetic cm data achieves 10.09 BLEU points over the given test set.
Anthology ID:
2021.calcs-1.5
Volume:
Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching
Month:
June
Year:
2021
Address:
Online
Venues:
CALCS | NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31–35
Language:
URL:
https://aclanthology.org/2021.calcs-1.5
DOI:
10.18653/v1/2021.calcs-1.5
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.calcs-1.5.pdf