Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient

Yuan Gao; Zujing Liu; Weizhong Zhang; Bo Du; Gui-Song Xia

doi:10.18653/v1/2025.acl-long.1421

Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient

Yuan Gao, Zujing Liu, Weizhong Zhang, Bo Du, Gui-Song Xia

Abstract

Recent Large-Language Models (LLMs) pruning methods typically operate at the post-training phase without the expensive weight finetuning, however, their pruning criteria often rely on **heuristically hand-crafted metrics**, potentially leading to suboptimal performance. We instead propose a novel **optimization-based structural pruning** that learns the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model. To preserve the efficiency, our method **eliminates the back-propagation** through the LLM *per se* during the optimization, requiring only **the forward pass of the LLM**. We achieve this by learning an underlying Bernoulli distribution to sample binary pruning masks, where we decouple the Bernoulli parameters from the LLM loss, thus facilitating an efficient optimization via *policy gradient estimator* without back-propagation. As a result, our method is able to 1) *support global and heterogeneous pruning* (*i.e.*, our method automatically determines different redundancy for different layers), and 2) *optionally initialize with a metric-based method* (for our Bernoulli distributions). Extensive experiments conducted on LLaMA, LLaMA-2, LLaMA-3, Vicuna, and Mistral models using the C4 and WikiText2 datasets demonstrate the promising performance of our method in efficiency and effectiveness.

Anthology ID:: 2025.acl-long.1421
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29356–29377
Language:
URL:: https://aclanthology.org/2025.acl-long.1421/
DOI:: 10.18653/v1/2025.acl-long.1421
Bibkey:
Cite (ACL):: Yuan Gao, Zujing Liu, Weizhong Zhang, Bo Du, and Gui-Song Xia. 2025. Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 29356–29377, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient (Gao et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.1421.pdf

PDF Cite Search Fix data