RoMantra: Optimizing Neural Machine Translation for Low-Resource Languages through Romanization

Govind Soni; Pushpak Bhattacharyya

RoMantra: Optimizing Neural Machine Translation for Low-Resource Languages through Romanization

Abstract

Neural Machine Translation (NMT) for low-resource language pairs with distinct scripts, such as Hindi-Chinese and Japanese-Hindi, poses significant challenges due to scriptural and linguistic differences. This paper investigates the efficacy of romanization as a preprocessing step to bridge these gaps. We compare baseline models trained on native scripts with models incorporating romanization in three configurations: both-side, source-side only, and target-side only. Additionally, we introduce a script restoration model that converts romanized output back to native scripts, ensuring accurate evaluation. Our experiments show that romanization, particularly when applied to both sides, improves translation quality across the studied language pairs. The script restoration model further enhances the practicality of this approach by enabling evaluation in native scripts with some performance loss. This work provides insights into leveraging romanization for NMT in low-resource, cross-script settings, presenting a promising direction for under-researched language combinations.

Anthology ID:: 2024.icon-1.18
Volume:: Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Month:: December
Year:: 2024
Address:: AU-KBC Research Centre, Chennai, India
Editors:: Sobha Lalitha Devi, Karunesh Arora
Venue:: ICON
SIG:
Publisher:: NLP Association of India (NLPAI)
Note:
Pages:: 157–168
Language:
URL:: https://aclanthology.org/2024.icon-1.18/
DOI:
Bibkey:
Cite (ACL):: Govind Soni and Pushpak Bhattacharyya. 2024. RoMantra: Optimizing Neural Machine Translation for Low-Resource Languages through Romanization. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 157–168, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
Cite (Informal):: RoMantra: Optimizing Neural Machine Translation for Low-Resource Languages through Romanization (Soni & Bhattacharyya, ICON 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.icon-1.18.pdf

PDF Cite Search Fix data