Manuel Antonio Rufino


2024

pdf bib
Samsung R&D Institute Philippines @ WMT 2024 Indic MT Task
Matthew Theodore Roque | Carlos Rafael Catalan | Dan John Velasco | Manuel Antonio Rufino | Jan Christian Blaise Cruz
Proceedings of the Ninth Conference on Machine Translation

This paper presents the methodology developed by the Samsung R&D Institute Philippines (SRPH) Language Intelligence Team (LIT) for the WMT 2024 Shared Task on Low-Resource Indic Language Translation. We trained standard sequence-to-sequence Transformer models from scratch for both English-to-Indic and Indic-to-English translation directions. Additionally, we explored data augmentation through backtranslation and the application of noisy channel reranking to improve translation quality. A multilingual model trained across all language pairs was also investigated. Our results demonstrate the effectiveness of the multilingual model, with significant performance improvements observed in most language pairs, highlighting the potential of shared language representations in low-resource translation scenarios.

pdf bib
Samsung R&D Institute Philippines @ WMT 2024 Low-resource Languages of Spain Shared Task
Dan John Velasco | Manuel Antonio Rufino | Jan Christian Blaise Cruz
Proceedings of the Ninth Conference on Machine Translation

This paper details the submission of Samsung R&D Institute Philippines (SRPH) Language Intelligence Team (LIT) to the WMT 2024 Low-resource Languages of Spain shared task. We trained translation models for Spanish to Aragonese, Spanish to Aranese/Occitan, and Spanish to Asturian using a standard sequence-to-sequence Transformer architecture, augmenting it with a noisy-channel reranking strategy to select better outputs during decoding. For Spanish to Asturian translation, our method reaches comparable BLEU scores to a strong commercial baseline translation system using only constrained data, backtranslations, noisy channel reranking, and a shared vocabulary spanning all four languages.