THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation

Yunlong Liang; Fandong Meng; Jie Zhou

doi:10.18653/v1/2025.acl-long.1040

THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation

Abstract

The sparse Mixture-of-Experts (MoE) has achieved significant progress for neural machine translation (NMT). However, there exist two limitations in current MoE solutions which may lead to sub-optimal performance: 1) they directly use the task knowledge of NMT into MoE (e.g., domain/linguistics-specific knowledge), which are generally unavailable at practical application and neglect the naturally grouped domain/linguistic properties; 2) the expert selection only depends on the localized token representation without considering the context, which fully grasps the state of each token in a global view. To address the above limitations, we propose THOR-MoE via arming the MoE with hierarchical task-guided and context-responsive routing policies. Specifically, it 1) firstly predicts the domain/language label and then extracts mixed domain/language representation to allocate task-level experts in a hierarchical manner; 2) injects the context information to enhance the token routing from the pre-selected task-level experts set, which can help each token to be accurately routed to more specialized and suitable experts. Extensive experiments on multi-domain translation and multilingual translation benchmarks with different architectures consistently demonstrate the superior performance of THOR-MoE. Additionally, the THOR-MoE operates as a plug-and-play module compatible with existing Top-(CITATION) or Top-(CITATION) routing schemes, ensuring broad applicability across diverse MoE architectures. For instance, compared with vanilla Top- (CITATION) routing, the context-aware manner can achieve an average improvement of 0.75 BLEU with less than 22% activated parameters on multi-domain translation tasks.

Anthology ID:: 2025.acl-long.1040
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21433–21445
Language:
URL:: https://aclanthology.org/2025.acl-long.1040/
DOI:: 10.18653/v1/2025.acl-long.1040
Bibkey:
Cite (ACL):: Yunlong Liang, Fandong Meng, and Jie Zhou. 2025. THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 21433–21445, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation (Liang et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1040.pdf

PDF Cite Search Fix data