TranssionMT’s Submission to the Indic MT Shared Task in WMT 2025

Zebiao Zhou; Hui Li; Xiangxun Zhu; Kangzhen Liu

doi:10.18653/v1/2025.wmt-1.106

TranssionMT’s Submission to the Indic MT Shared Task in WMT 2025

Zebiao Zhou, Hui Li, Xiangxun Zhu, Kangzhen Liu

Abstract

This study addresses the low-resource Indian lan- 002guage translation task (English Assamese, English Ma- 003nipuri) at WMT 2025, proposing a cross-iterative back- 004translation and data augmentation approach based on 005dual pre-trained models to enhance translation perfor- 006mance in low-resource scenarios. The research method- 007ology primarily encompasses four aspects: (1) Utilizing 008open-source pre-trained models IndicTrans2_1B and 009NLLB_3.3B, fine-tuning them on official bilingual data, 010followed by alternating back-translation and incremen- 011tal training to generate high-quality pseudo-parallel cor- 012pora and optimize model parameters through multiple 013iterations; (2) Employing the open-source semantic sim- 014ilarity model (all-mpnet-base-v2) to filter monolingual 015sentences with low semantic similarity to the test set 016from open-source corpora such as NLLB and BPCC, 017thereby improving the relevance of monolingual data 018to the task; (3) Cleaning the training data, including 019removing URL and HTML format content, eliminating 020untranslated sentences in back-translation, standardiz- 021ing symbol formats, and normalizing capitalization of 022the first letter; (4) During the model inference phase, 023combining the outputs generated by the fine-tuned In- 024dicTrans2_1B and NLLB3.3B

Anthology ID:: 2025.wmt-1.106
Volume:: Proceedings of the Tenth Conference on Machine Translation
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:: WMT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1271–1275
Language:
URL:: https://aclanthology.org/2025.wmt-1.106/
DOI:: 10.18653/v1/2025.wmt-1.106
Bibkey:
Cite (ACL):: Zebiao Zhou, Hui Li, Xiangxun Zhu, and Kangzhen Liu. 2025. TranssionMT’s Submission to the Indic MT Shared Task in WMT 2025. In Proceedings of the Tenth Conference on Machine Translation, pages 1271–1275, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: TranssionMT’s Submission to the Indic MT Shared Task in WMT 2025 (Zhou et al., WMT 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.wmt-1.106.pdf

PDF Cite Search Fix data