Kazuki Yano


2024

pdf bib
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano | Takumi Ito | Jun Suzuki
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

Pre-training large language models faces significant memory challenges due to the large size of model weights.We propose STaged parameter-Efficient Pre-training (STEP), which combines ideas from parameter-efficient tuning and staged training. We conduct experiments on pre-training models of various sizes and demonstrate that STEP can achieve up to a 40.4% reduction in maximum memory requirement compared to vanilla pre-training while maintaining comparable performance.
Search
Co-authors
Venues