AkibaNLP-TUT: Injecting Language-Specific Word-Level Noise for Low-Resource Language Translation

Shoki Hamada, Tomoyosi Akiba, Hajime Tsukada


Abstract
In this paper, we describes our system for the WMT 2025 Low-Resource Indic Language Translation Shared Task.The language directions addressed are Assamese↔English and Manipuri→English.We propose a method to improve translation performance from low-resource languages (LRLs) to English by injecting Language-specific word-level noise into the parallel corpus of a closely related high-resource language (HRL).In the proposed method, word replacements are performed based on edit distance, using vocabulary and frequency information extracted from an LRL monolingual corpus.Experiments conducted on Assamese and Manipuri show that, in the absence of LRL parallel data, the proposed method outperforms both the w/o noise setting and existing approaches. Furthermore, we confirmed that increasing the size of the monolingual corpus used for noise injection leads to improved translation performance.
Anthology ID:
2025.wmt-1.104
Volume:
Proceedings of the Tenth Conference on Machine Translation
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1259–1264
Language:
URL:
https://aclanthology.org/2025.wmt-1.104/
DOI:
Bibkey:
Cite (ACL):
Shoki Hamada, Tomoyosi Akiba, and Hajime Tsukada. 2025. AkibaNLP-TUT: Injecting Language-Specific Word-Level Noise for Low-Resource Language Translation. In Proceedings of the Tenth Conference on Machine Translation, pages 1259–1264, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
AkibaNLP-TUT: Injecting Language-Specific Word-Level Noise for Low-Resource Language Translation (Hamada et al., WMT 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.wmt-1.104.pdf