Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models

Pingjie Wang; Hongcheng Liu; Yanfeng Wang; Yu Wang

Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models

Pingjie Wang, Hongcheng Liu, Yanfeng Wang, Yu Wang

Abstract

Structured pruning is an effective technique for compressing pre-trained language models (PLMs), reducing model size and improving inference speed for efficient deployment. However, most of existing pruning algorithms require retraining, leading to additional computational overhead. While some retraining-free approaches have been proposed for classification tasks, they still require a fully fine-tuned model for the task, and may cause catastrophic performance degradation on generative tasks. To address these challenges, we propose P-pruning (pre-pruning), an innovative task-specific compression framework. P-pruning prunes redundant modules of PLMs before fine-tuning, reducing the costs associated with fine-tuning. We also introduce a pruning algorithm for this framework, which includes two techniques: (1) module clustering, which clusters the outputs of all heads and neurons based on the task input; and (2) centroid selection, which identifies the most salient element in each cluster and prunes the others. We apply our method to BERT and GPT-2 and evaluate its effectiveness on GLUE, SQuAD, WikiText-2, WikiText-103, and PTB datasets. Experimental results demonstrate that our approach achieves higher performance in both classification and generative tasks, while also reducing the time required for fine-tuning.

Anthology ID:: 2024.lrec-main.1162
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 13279–13289
Language:
URL:: https://aclanthology.org/2024.lrec-main.1162
DOI:
Bibkey:
Cite (ACL):: Pingjie Wang, Hongcheng Liu, Yanfeng Wang, and Yu Wang. 2024. Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13279–13289, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models (Wang et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.1162.pdf

PDF Cite Search