Machine Translation Advancements for Low-Resource Indian Languages in WMT23: CFILT-IITB’s Effort for Bridging the Gap

Pranav Gaikwad, Meet Doshi, Sourabh Deoghare, Pushpak Bhattacharyya


Abstract
This paper is related to the submission of the CFILT-IITB team for the task called IndicMT in WMT23. The paper describes our MT systems submitted to the WMT23 IndicMT shared task. The task focused on MT system development from/to English and four low-resource North-East Indian languages, viz., Assamese, Khasi, Manipuri, and Mizo. We trained them on a small parallel corpus resulting in poor-quality systems. Therefore, we utilize transfer learning with the help of a large pre-trained multilingual NMT system. Since this approach produced the best results, we submitted our NMT models for the shared task using this approach.
Anthology ID:
2023.wmt-1.89
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
950–953
Language:
URL:
https://aclanthology.org/2023.wmt-1.89
DOI:
10.18653/v1/2023.wmt-1.89
Bibkey:
Cite (ACL):
Pranav Gaikwad, Meet Doshi, Sourabh Deoghare, and Pushpak Bhattacharyya. 2023. Machine Translation Advancements for Low-Resource Indian Languages in WMT23: CFILT-IITB’s Effort for Bridging the Gap. In Proceedings of the Eighth Conference on Machine Translation, pages 950–953, Singapore. Association for Computational Linguistics.
Cite (Informal):
Machine Translation Advancements for Low-Resource Indian Languages in WMT23: CFILT-IITB’s Effort for Bridging the Gap (Gaikwad et al., WMT 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.wmt-1.89.pdf