Structured Pruning for Efficient Generative Pre-trained Language Models

Chaofan Tao, Lu Hou, Haoli Bai, Jiansheng Wei, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong


Abstract
The increasing sizes of large generative Pre-trained Language Models (PLMs) hinder their deploymentin real-world applications. To obtain efficient PLMs, previous studies mostly focus on pruning the attention heads and feed-forward networks (FFNs) of the Transformer. Nevertheless, we find that in generative PLMs, the hidden dimension shared by many other modules (e.g., embedding layer and layer normalization) contains persistent outliers regardless of the network input. This study comprehensively investigates the structured pruning of generative PLMs with all the above compressible components. To identify redundant network structures, we assign learnable masks over compressible components followed by sparse training. Various sizes of PLMs can be flexibly extracted via different thresholds, and are then task-specifically fine-tuned for further improvement. Extensive experiments on language modeling, summarization and machine translation validate the effectiveness of the proposed method. For example, the pruned BART brings 1.51x/6.96x inference speedup on GPU/CPU with 67% size reduction, and can be further combined with quantization for more than 25× compression.
Anthology ID:
2023.findings-acl.692
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10880–10895
Language:
URL:
https://aclanthology.org/2023.findings-acl.692
DOI:
10.18653/v1/2023.findings-acl.692
Bibkey:
Cite (ACL):
Chaofan Tao, Lu Hou, Haoli Bai, Jiansheng Wei, Xin Jiang, Qun Liu, Ping Luo, and Ngai Wong. 2023. Structured Pruning for Efficient Generative Pre-trained Language Models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 10880–10895, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Structured Pruning for Efficient Generative Pre-trained Language Models (Tao et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.692.pdf
Video:
 https://aclanthology.org/2023.findings-acl.692.mp4