A3-108 Machine Translation System for LoResMT Shared Task @MT Summit 2021 Conference

Saumitra Yadav, Manish Shrivastava


Abstract
In this paper, we describe our submissions for LoResMT Shared Task @MT Summit 2021 Conference. We built statistical translation systems in each direction for English ⇐⇒ Marathi language pair. This paper outlines initial baseline experiments with various tokenization schemes to train models. Using optimal tokenization scheme we create synthetic data and further train augmented dataset to create more statistical models. Also, we reorder English to match Marathi syntax to further train another set of baseline and data augmented models using various tokenization schemes. We report configuration of the submitted systems and results produced by them.
Anthology ID:
2021.mtsummit-loresmt.12
Volume:
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)
Month:
August
Year:
2021
Address:
Virtual
Editors:
John Ortega, Atul Kr. Ojha, Katharina Kann, Chao-Hong Liu
Venue:
LoResMT
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
124–128
Language:
URL:
https://aclanthology.org/2021.mtsummit-loresmt.12
DOI:
Bibkey:
Cite (ACL):
Saumitra Yadav and Manish Shrivastava. 2021. A3-108 Machine Translation System for LoResMT Shared Task @MT Summit 2021 Conference. In Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021), pages 124–128, Virtual. Association for Machine Translation in the Americas.
Cite (Informal):
A3-108 Machine Translation System for LoResMT Shared Task @MT Summit 2021 Conference (Yadav & Shrivastava, LoResMT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.mtsummit-loresmt.12.pdf