LLaMA Pro: Progressive LLaMA with Block Expansion

Chengyue Wu; Yukang Gan; Yixiao Ge; Zeyu Lu; Jiahao Wang; Ye Feng; Ying Shan; Ping Luo

doi:10.18653/v1/2024.acl-long.352

LLaMA Pro: Progressive LLaMA with Block Expansion

Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ying Shan, Ping Luo

Abstract

Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model’s knowledge while mitigating forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro - Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.

Anthology ID:: 2024.acl-long.352
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6518–6537
Language:
URL:: https://aclanthology.org/2024.acl-long.352/
DOI:: 10.18653/v1/2024.acl-long.352
Bibkey:
Cite (ACL):: Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ying Shan, and Ping Luo. 2024. LLaMA Pro: Progressive LLaMA with Block Expansion. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6518–6537, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: LLaMA Pro: Progressive LLaMA with Block Expansion (Wu et al., ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-long.352.pdf

PDF Cite Search Fix data