When Scripts Diverge: Strengthening Low-Resource Neural Machine Translation Through Phonetic Cross-Lingual Transfer

Ammon Shurtz; Christian Richardson; Stephen D. Richardson

doi:10.18653/v1/2025.mrl-main.22

When Scripts Diverge: Strengthening Low-Resource Neural Machine Translation Through Phonetic Cross-Lingual Transfer

Ammon Shurtz, Christian Richardson, Stephen D. Richardson

Abstract

Multilingual Neural Machine Translation (MNMT) models enhance translation quality for low-resource languages by exploiting cross-lingual similarities during training—a process known as knowledge transfer. This transfer is particularly effective between languages that share lexical or structural features, often enabled by a common orthography. However, languages with strong phonetic and lexical similarities but distinct writing systems experience limited benefits, as the absence of a shared orthography hinders knowledge transfer. To address this limitation, we propose an approach based on phonetic information that enhances token-level alignment across scripts by leveraging transliterations. We systematically evaluate several phonetic transcription techniques and strategies for incorporating phonetic information into NMT models. Our results show that using a shared encoder to process orthographic and phonetic inputs separately consistently yields the best performance for Khmer, Thai, and Lao in both directions with English, and that our custom Cognate-Aware Transliteration (CAT) method consistently improves translation quality over the baseline.

Anthology ID:: 2025.mrl-main.22
Volume:: Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
Month:: November
Year:: 2025
Address:: Suzhuo, China
Editors:: David Ifeoluwa Adelani, Catherine Arnett, Duygu Ataman, Tyler A. Chang, Hila Gonen, Rahul Raja, Fabian Schmidt, David Stap, Jiayi Wang
Venues:: MRL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 336–346
Language:
URL:: https://aclanthology.org/2025.mrl-main.22/
DOI:: 10.18653/v1/2025.mrl-main.22
Bibkey:
Cite (ACL):: Ammon Shurtz, Christian Richardson, and Stephen D. Richardson. 2025. When Scripts Diverge: Strengthening Low-Resource Neural Machine Translation Through Phonetic Cross-Lingual Transfer. In Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025), pages 336–346, Suzhuo, China. Association for Computational Linguistics.
Cite (Informal):: When Scripts Diverge: Strengthening Low-Resource Neural Machine Translation Through Phonetic Cross-Lingual Transfer (Shurtz et al., MRL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.mrl-main.22.pdf

PDF Cite Search Fix data