TokenDrop + BucketSampler: Towards Efficient Padding-free Fine-tuning of Language Models

Amrit Nagarajan, Anand Raghunathan


Abstract
The great success of Language Models (LMs) for various Natural Language Processing (NLP) tasks is accompanied by computational challenges during both pre-training and fine-tuning. Pre-training has attracted significant attention due to its huge computational footprint. We focus on the fine-tuning of pre-trained LMs, which is expected to be performed much more frequently as the pre-trained models are adapted to downstream tasks. During fine-tuning, the presence of variable-length input sequences necessitates the use of padding tokens when batching sequences. These padding tokens lead to ineffectual computations, adversely impacting the efficiency of fine-tuning. We also observe that LMs memorize the limited task-specific training data despite the use of known regularization methods. Based on these insights, we present TokenDrop + BucketSampler, a framework that simultaneously improves efficiency and accuracy of LM fine-tuning. BucketSampler generates batches of samples with lower variance in sequence lengths to reduce the number of padding tokens, but does so without the accompanying accuracy drop seen in previous approaches. TokenDrop is a new regularizer that prunes a random subset of insignificant tokens from each input sequence in every epoch to prevent overfitting. TokenDrop drops more tokens from the longer sequences in each batch to further reduce variance in input lengths and the need for padding. TokenDrop + BucketSampler accelerates fine-tuning on diverse downstream tasks by up to 10.61X, while also producing models that are up to 1.17% more accurate compared to conventional fine-tuning. Code is available at https://github.com/amrnag/TokenDrop-BucketSampler. .
Anthology ID:
2023.findings-emnlp.782
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11682–11695
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.782
DOI:
10.18653/v1/2023.findings-emnlp.782
Bibkey:
Cite (ACL):
Amrit Nagarajan and Anand Raghunathan. 2023. TokenDrop + BucketSampler: Towards Efficient Padding-free Fine-tuning of Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11682–11695, Singapore. Association for Computational Linguistics.
Cite (Informal):
TokenDrop + BucketSampler: Towards Efficient Padding-free Fine-tuning of Language Models (Nagarajan & Raghunathan, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.782.pdf