Structured Pruning for Large Language Models Using Coupled Components Elimination and Minor Fine-tuning

Honghe Zhang, XiaolongShi XiaolongShi, Jingwei Sun, Guangzhong Sun


Abstract
Large language models (LLMs) have demonstrated powerful capabilities in natural language processing, yet their vast number of parameters poses challenges for deployment and inference efficiency. Structured model pruning emerges as a viable approach to reduce model size and accelerate inference, without requiring specialized operators and libraries for deployment. However, structured pruning often severely weakens the model’s capability.Despite repetitive fine-tuning can restore the capability to a certain extent, it impairs LLMs’ utility as versatile problem solvers.To address this issue, we propose a novel structured pruning algorithm tailored for LLMs. It derives the importance of different components, namely rows and columns in parameter matrices, based on intermediate data dependencies. Then it removes coupled components across different layers simultaneously and preserves dependency relationships within remaining parameters, avoiding significant performance degradation. The pruned model requires only few epochs of fine-tuning to restore its performance, ensuring the model’s ability to generalize.Empirical evaluations on LLaMA, Vicuna, and ChatGLM3 demonstrate our algorithm’s efficacy, yielding 20% parameter reduction while retaining at least 94.4% of original performance metrics.
Anthology ID:
2024.findings-naacl.1
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–12
Language:
URL:
https://aclanthology.org/2024.findings-naacl.1
DOI:
Bibkey:
Cite (ACL):
Honghe Zhang, XiaolongShi XiaolongShi, Jingwei Sun, and Guangzhong Sun. 2024. Structured Pruning for Large Language Models Using Coupled Components Elimination and Minor Fine-tuning. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 1–12, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Structured Pruning for Large Language Models Using Coupled Components Elimination and Minor Fine-tuning (Zhang et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.1.pdf
Copyright:
 2024.findings-naacl.1.copyright.pdf