Global-Pruner: A Stable and Efficient Pruner for Retraining-Free Pruning of Encoder-Based Language Models

Guangzhen Yao, Yuehan Wang, Hui Xu, Long Zhang, MiaoQI MiaoQI


Abstract
Large language models (LLMs) have achieved significant success in complex tasks across various domains, but they come with high computational costs and inference latency issues. Pruning, as an effective method, can significantly reduce inference costs. However, current pruning algorithms for encoder-based language models often focus on locally optimal solutions, neglecting a comprehensive exploration of the global solution space. This oversight can lead to instability in the solution process, thereby affecting the overall performance of the model. To address these challenges, we propose a structured pruning algorithm named G-Pruner (Global Pruner), comprising two integral components: PPOM (Proximal Policy Optimization Mask) and CG²MT (Conjugate Gradient Squared Mask Tuning), utilizing a global optimization strategy. This strategy not only eliminates the need for retraining but also ensures the algorithm’s stability and adaptability to environmental changes, effectively addressing the issue of focusing solely on immediate optima while neglecting long-term effects. This method is evaluated on the GLUE and SQuAD benchmarks using BERTBASE and DistilBERT models. The experimental results indicate that without any retraining, G-Pruner achieves significant accuracy improvements on the SQuAD2.0 task with a FLOPs constraint of 60%, demonstrating a 6.02% increase in F1 score compared with baseline algorithms.
Anthology ID:
2024.conll-1.5
Volume:
Proceedings of the 28th Conference on Computational Natural Language Learning
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Libby Barak, Malihe Alikhani
Venue:
CoNLL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
46–55
Language:
URL:
https://aclanthology.org/2024.conll-1.5
DOI:
Bibkey:
Cite (ACL):
Guangzhen Yao, Yuehan Wang, Hui Xu, Long Zhang, and MiaoQI MiaoQI. 2024. Global-Pruner: A Stable and Efficient Pruner for Retraining-Free Pruning of Encoder-Based Language Models. In Proceedings of the 28th Conference on Computational Natural Language Learning, pages 46–55, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
Global-Pruner: A Stable and Efficient Pruner for Retraining-Free Pruning of Encoder-Based Language Models (Yao et al., CoNLL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.conll-1.5.pdf