Samsung R&D Institute Philippines @ WMT 2024 Indic MT Task

Matthew Theodore Roque, Carlos Rafael Catalan, Dan John Velasco, Manuel Antonio Rufino, Jan Christian Blaise Cruz


Abstract
This paper presents the methodology developed by the Samsung R&D Institute Philippines (SRPH) Language Intelligence Team (LIT) for the WMT 2024 Shared Task on Low-Resource Indic Language Translation. We trained standard sequence-to-sequence Transformer models from scratch for both English-to-Indic and Indic-to-English translation directions. Additionally, we explored data augmentation through backtranslation and the application of noisy channel reranking to improve translation quality. A multilingual model trained across all language pairs was also investigated. Our results demonstrate the effectiveness of the multilingual model, with significant performance improvements observed in most language pairs, highlighting the potential of shared language representations in low-resource translation scenarios.
Anthology ID:
2024.wmt-1.62
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
735–741
Language:
URL:
https://aclanthology.org/2024.wmt-1.62
DOI:
10.18653/v1/2024.wmt-1.62
Bibkey:
Cite (ACL):
Matthew Theodore Roque, Carlos Rafael Catalan, Dan John Velasco, Manuel Antonio Rufino, and Jan Christian Blaise Cruz. 2024. Samsung R&D Institute Philippines @ WMT 2024 Indic MT Task. In Proceedings of the Ninth Conference on Machine Translation, pages 735–741, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Samsung R&D Institute Philippines @ WMT 2024 Indic MT Task (Roque et al., WMT 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.wmt-1.62.pdf