The University of Edinburgh’s Submission to the WMT22 Code-Mixing Shared Task (MixMT)

Faheem Kirefu, Vivek Iyer, Pinzhen Chen, Laurie Burchell


Abstract
The University of Edinburgh participated in the WMT22 shared task on code-mixed translation. This consists of two subtasks: i) generating code-mixed Hindi/English (Hinglish) text generation from parallel Hindi and English sentences and ii) machine translation from Hinglish to English. As both subtasks are considered low-resource, we focused our efforts on careful data generation and curation, especially the use of backtranslation from monolingual resources. For subtask 1 we explored the effects of constrained decoding on English and transliterated subwords in order to produce Hinglish. For subtask 2, we investigated different pretraining techniques, namely comparing simple initialisation from existing machine translation models and aligned augmentation. For both subtasks, we found that our baseline systems worked best. Our systems for both subtasks were one of the overall top-performing submissions.
Anthology ID:
2022.wmt-1.115
Volume:
Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1145–1157
Language:
URL:
https://aclanthology.org/2022.wmt-1.115
DOI:
Bibkey:
Cite (ACL):
Faheem Kirefu, Vivek Iyer, Pinzhen Chen, and Laurie Burchell. 2022. The University of Edinburgh’s Submission to the WMT22 Code-Mixing Shared Task (MixMT). In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 1145–1157, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
The University of Edinburgh’s Submission to the WMT22 Code-Mixing Shared Task (MixMT) (Kirefu et al., WMT 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wmt-1.115.pdf