STEP: Staged Parameter-Efficient Pre-training for Large Language Models

Kazuki Yano; Takumi Ito; Jun Suzuki

STEP: Staged Parameter-Efficient Pre-training for Large Language Models

Abstract

Pre-training large language models faces significant memory challenges due to the large size of model weights.We propose STaged parameter-Efficient Pre-training (STEP), which combines ideas from parameter-efficient tuning and staged training. We conduct experiments on pre-training models of various sizes and demonstrate that STEP can achieve up to a 40.4% reduction in maximum memory requirement compared to vanilla pre-training while maintaining comparable performance.

Anthology ID:: 2024.acl-srw.50
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Xiyan Fu, Eve Fleisig
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 607–614
Language:
URL:: https://aclanthology.org/2024.acl-srw.50
DOI:
Bibkey:
Cite (ACL):: Kazuki Yano, Takumi Ito, and Jun Suzuki. 2024. STEP: Staged Parameter-Efficient Pre-training for Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 607–614, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: STEP: Staged Parameter-Efficient Pre-training for Large Language Models (Yano et al., ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-srw.50.pdf

PDF Cite Search