Jingwei Sun


2024

pdf bib
Structured Pruning for Large Language Models Using Coupled Components Elimination and Minor Fine-tuning
Honghe Zhang | XiaolongShi XiaolongShi | Jingwei Sun | Guangzhong Sun
Findings of the Association for Computational Linguistics: NAACL 2024

Large language models (LLMs) have demonstrated powerful capabilities in natural language processing, yet their vast number of parameters poses challenges for deployment and inference efficiency. Structured model pruning emerges as a viable approach to reduce model size and accelerate inference, without requiring specialized operators and libraries for deployment. However, structured pruning often severely weakens the model’s capability.Despite repetitive fine-tuning can restore the capability to a certain extent, it impairs LLMs’ utility as versatile problem solvers.To address this issue, we propose a novel structured pruning algorithm tailored for LLMs. It derives the importance of different components, namely rows and columns in parameter matrices, based on intermediate data dependencies. Then it removes coupled components across different layers simultaneously and preserves dependency relationships within remaining parameters, avoiding significant performance degradation. The pruned model requires only few epochs of fine-tuning to restore its performance, ensuring the model’s ability to generalize.Empirical evaluations on LLaMA, Vicuna, and ChatGLM3 demonstrate our algorithm’s efficacy, yielding 20% parameter reduction while retaining at least 94.4% of original performance metrics.