Router-Tuning: A Simple and Effective Approach for Dynamic Depth

Shwai He; Tao Ge; Guoheng Sun; Bowei Tian; Xiaoyang Wang; Dong Yu (于东)

doi:10.18653/v1/2025.emnlp-main.99

Router-Tuning: A Simple and Effective Approach for Dynamic Depth

Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, Dong Yu

Abstract

The Mixture of Depths (MoD) was introduced to improve computational efficiency by dynamically skipping less important layers, reducing redundant computation while maintaining model capacity. Despite its promise, existing MoD approaches remain under-explored and face two main challenges: (1) high training costs due to the need to train the entire model along with the routers that determine which layers to skip, and (2) performance degradation when important layers are bypassed. In response to the first issue, we propose Router-Tuning, which fine-tunes only the routers on a small dataset, drastically reducing the computational overhead associated with full model training. For the second challenge, we investigate across different architectures and granularities, demonstrating its effectiveness on Attention layers and MoE layers. This method preserves the model’s performance while significantly enhancing computational and memory efficiency. Extensive experiments demonstrate that our approach delivers competitive results while dramatically improving the computation efficiency, e.g., 21% speedup and only a 0.2% performance drop. The code will be released upon acceptance.

Anthology ID:: 2025.emnlp-main.99
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1925–1938
Language:
URL:: https://aclanthology.org/2025.emnlp-main.99/
DOI:: 10.18653/v1/2025.emnlp-main.99
Bibkey:
Cite (ACL):: Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, and Dong Yu. 2025. Router-Tuning: A Simple and Effective Approach for Dynamic Depth. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 1925–1938, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Router-Tuning: A Simple and Effective Approach for Dynamic Depth (He et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.99.pdf
Checklist:: 2025.emnlp-main.99.checklist.pdf

PDF Cite Search Checklist Fix data