DeepTrans: Deep Reasoning Translation via Reinforcement Learning

Jiaan Wang; Fandong Meng; Jie Zhou

doi:10.1162/tacl.a.65

DeepTrans: Deep Reasoning Translation via Reinforcement Learning

Abstract

Recently, deep reasoning LLMs (e.g., OpenAI o1 and DeepSeek-R1) have shown promising performance in various downstream tasks. Free translation is an important and interesting task in the multilingual world, which requires going beyond word-for-word translation. However, the task is still under-explored in deep reasoning LLMs. In this paper, we introduce DeepTrans, a deep reasoning translation model that learns free translation via reinforcement learning (RL). Specifically, we carefully build a reward model with pre-defined scoring criteria on both the translation results and the thought processes. The reward model teaches DeepTrans how to think and free-translate the given sentences during RL. Besides, our RL training does not need any labeled translations, avoiding the human-intensive annotation or resource-intensive data synthesis. Experimental results show the effectiveness of DeepTrans. Using Qwen2.5-7B as the backbone, DeepTrans improves performance by 16.3% in literature translation, and outperforms strong deep reasoning LLMs. Moreover, we summarize the failures and interesting findings during our RL exploration. We hope this work could inspire other researchers in free translation.1

Anthology ID:: 2026.tacl-1.3
Volume:: Transactions of the Association for Computational Linguistics, Volume 14
Month:
Year:: 2026
Address:: Cambridge, MA
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 47–63
Language:
URL:: https://aclanthology.org/2026.tacl-1.3/
DOI:: 10.1162/tacl.a.65
Bibkey:
Cite (ACL):: Jiaan Wang, Fandong Meng, and Jie Zhou. 2026. DeepTrans: Deep Reasoning Translation via Reinforcement Learning. Transactions of the Association for Computational Linguistics, 14:47–63.
Cite (Informal):: DeepTrans: Deep Reasoning Translation via Reinforcement Learning (Wang et al., TACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.tacl-1.3.pdf

PDF Cite Search Fix data