Unlocking Emergent Modularity in Large Language Models

Zihan Qiu; Zeyu Huang; Jie Fu

Unlocking Emergent Modularity in Large Language Models

Abstract

Modular Neural Networks (MNNs) demonstrate various advantages over monolithic models.Existing MNNs are generally explicit: their modular architectures are pre-defined, with individual modules expected to implement distinct functions.Recent works reveal that there exists implicit modularity in standard pre-trained transformers, namely Emergent Modularity.They indicate that such modular structures spontaneously exhibit during the early pre-training phase.Despite the benefits of modularity, most Language Models (LMs) are still treated as monolithic models in the pre-train and fine-tune paradigm, with their emergent modularity locked and underutilized.In this work, focusing on unlocking the emergent modularity in LMs, we showcase that standard LMs could be fine-tuned as their Mixture-of-Expert (MoEs) counterparts without introducing any extra parameters. Such MoEs are derived from emergent modularity and are referred to as Emergent MoEs (EMoE).Our experiments demonstrate that fine-tuning EMoE effectively improves downstream in-domain and out-of-domain generalization compared with vanilla fine-tuning.Our analysis and ablation studies further illustrate that it is robust to various configurations and can scale up to Large Language Models (i.e., Llama2-7B and Llama-30B). Code is available at https://github.com/qiuzh20/EMoE.

Anthology ID:: 2024.naacl-long.144
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2638–2660
Language:
URL:: https://aclanthology.org/2024.naacl-long.144
DOI:
Bibkey:
Cite (ACL):: Zihan Qiu, Zeyu Huang, and Jie Fu. 2024. Unlocking Emergent Modularity in Large Language Models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 2638–2660, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Unlocking Emergent Modularity in Large Language Models (Qiu et al., NAACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.naacl-long.144.pdf

PDF Cite Search