SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics

Arash Ardakani, Altan Haan, Shangyin Tan, Doru Thom Popovici, Alvin Cheung, Costin Iancu, Koushik Sen


Abstract
Transformer-based models, such as BERT and ViT, have achieved state-of-the-art results across different natural language processing (NLP) and computer vision (CV) tasks. However, these models are extremely memory intensive during their fine-tuning process, making them difficult to deploy on GPUs with limited memory resources. To address this issue, we introduce a new tool called SlimFit that reduces the memory requirements of these models by dynamically analyzing their training dynamics and freezing less-contributory layers during fine-tuning. The layers to freeze are chosen using a runtime inter-layer scheduling algorithm. This allows SlimFit to freeze up to 95% of layers and reduce the overall on-device GPU memory usage of transformer-based models such as ViT and BERT by an average of 2.2x, across different NLP and CV benchmarks/datasets such as GLUE, SQuAD 2.0, CIFAR-10, CIFAR-100 and ImageNet with an average degradation of 0.2% in accuracy. For such NLP and CV tasks, SlimFit can reduce up to 3.1x the total on-device memory usage with an accuracy degradation of only up to 0.4%. As a result, while fine-tuning of ViT on ImageNet and BERT on SQuAD 2.0 with a batch size of 128 requires 3 and 2 32GB GPUs, respectively, SlimFit enables fine-tuning them on a single 32GB GPU without any significant accuracy degradation. The code of SlimFit is available at https://github.com/arashardakani/SlimFit.
Anthology ID:
2024.naacl-long.345
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6218–6236
Language:
URL:
https://aclanthology.org/2024.naacl-long.345
DOI:
Bibkey:
Cite (ACL):
Arash Ardakani, Altan Haan, Shangyin Tan, Doru Thom Popovici, Alvin Cheung, Costin Iancu, and Koushik Sen. 2024. SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6218–6236, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics (Ardakani et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.345.pdf
Copyright:
 2024.naacl-long.345.copyright.pdf