Language-Specific Pruning for Efficient Reduction of Large Language Models

Maksym Shamrai

Language-Specific Pruning for Efficient Reduction of Large Language Models

Abstract

Delving into pruning techniques is essential to boost the efficiency of Large Language Models (LLMs) by reducing their size and computational demands, resulting in faster and more cost-effective inference. In this work, our key contribution lies in recognizing that LLMs trained on diverse languages manifest distinct language-specific weight distributions. Exploiting this insight, we illustrate that pruning LLMs using language-specific data results in a more potent model compression. Empirical evidence underscores the critical nature of pruning on language-specific data, highlighting a noteworthy impact on the perplexity of Ukrainian texts compared to pruning on English data. The proposed methodology significantly reduces the size of LLaMA, LLaMA 2 and Mistral models while preserving competitive performance. This research underscores the significance of linguistic considerations in LLM pruning and advocates for language-specific optimization, establishing a framework for more efficient and tailored language models across diverse linguistic contexts. Additionally, all experiments were conducted using a single consumer-grade NVIDIA RTX 3090 GPU, and the code is available at https://github.com/mshamrai/language-specific-pruning.

Anthology ID:: 2024.unlp-1.16
Volume:: Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Mariana Romanyshyn, Nataliia Romanyshyn, Andrii Hlybovets, Oleksii Ignatenko
Venue:: UNLP
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 135–140
Language:
URL:: https://aclanthology.org/2024.unlp-1.16/
DOI:
Bibkey:
Cite (ACL):: Maksym Shamrai. 2024. Language-Specific Pruning for Efficient Reduction of Large Language Models. In Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, pages 135–140, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Language-Specific Pruning for Efficient Reduction of Large Language Models (Shamrai, UNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.unlp-1.16.pdf

PDF Cite Search Fix data