Carlos Rafael Catalan
2024
Samsung R&D Institute Philippines @ WMT 2024 Indic MT Task
Matthew Theodore Roque
|
Carlos Rafael Catalan
|
Dan John Velasco
|
Manuel Antonio Rufino
|
Jan Christian Blaise Cruz
Proceedings of the Ninth Conference on Machine Translation
This paper presents the methodology developed by the Samsung R&D Institute Philippines (SRPH) Language Intelligence Team (LIT) for the WMT 2024 Shared Task on Low-Resource Indic Language Translation. We trained standard sequence-to-sequence Transformer models from scratch for both English-to-Indic and Indic-to-English translation directions. Additionally, we explored data augmentation through backtranslation and the application of noisy channel reranking to improve translation quality. A multilingual model trained across all language pairs was also investigated. Our results demonstrate the effectiveness of the multilingual model, with significant performance improvements observed in most language pairs, highlighting the potential of shared language representations in low-resource translation scenarios.