STEP: Staged Parameter-Efficient Pre-training for Large Language Models

Kazuki Yano, Takumi Ito, Jun Suzuki


Abstract
Pre-training large language models faces significant memory challenges due to the large size of model weights.We propose STaged parameter-Efficient Pre-training (STEP), which combines ideas from parameter-efficient tuning and staged training. We conduct experiments on pre-training models of various sizes and demonstrate that STEP can achieve up to a 40.4% reduction in maximum memory requirement compared to vanilla pre-training while maintaining comparable performance.
Anthology ID:
2024.acl-srw.50
Volume:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Xiyan Fu, Eve Fleisig
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
607–614
Language:
URL:
https://aclanthology.org/2024.acl-srw.50
DOI:
Bibkey:
Cite (ACL):
Kazuki Yano, Takumi Ito, and Jun Suzuki. 2024. STEP: Staged Parameter-Efficient Pre-training for Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 607–614, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
STEP: Staged Parameter-Efficient Pre-training for Large Language Models (Yano et al., ACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.acl-srw.50.pdf