EfficientLLM: Unified Pruning-Aware Pretraining for Auto-Designed Compact Language Models

Xingrun Xing; Zheng Liu; Shitao Xiao; Boyan Gao; Yiming Liang; Haokun Lin; Xianlin Zeng; Guoqi Li; Jiajun Zhang

EfficientLLM: Unified Pruning-Aware Pretraining for Auto-Designed Compact Language Models

Xingrun Xing, Zheng Liu, Shitao Xiao, Boyan Gao, Yiming Liang, Haokun Lin, Xianlin Zeng, Guoqi Li, Jiajun Zhang

Abstract

Modern large language models (LLMs) driven by scaling laws achieve emergent intelligence in large model sizes. Recently, the increasing concerns about cloud costs, latency and privacy make it an urgent requirement to develop compact edge language models. Distinguished from direct pretraining that bounded by parameter scaling law, this work proposes the unified pruning-aware pretraining, focusing on pretraining compact models while preserving performance of much larger source models, termed EfficientLLM. It features following characteristics: 1) Pruning in Pretraining Corpus: we introduce minimal parameter groups to decouple LLMs and continuously optimize model architecture with classic pruning methods like LLM-Pruner and SparseGPT during pretraining. We reveal that it achieves top-quality compact language models to scale up LLM pruning to large scale pretraining. 2) Auto-Designed Architecture: the LLM architecture is auto-designed during saliency-driven pruning, unifying pretraining, architectural design, and parameter pruning into a single process. Based on these, EfficientLLM significantly outperforms directly pretrained baselines with 100M ∼ 1B parameters, such as MobileLLM, SmolLM, Qwen2.5-0.5B, OLMo-1B, Llama3.2-1B in commen sense benchmarks, which bridges the performance gap between traditional LLM compression and direct pretraining. We open source on https://github.com/Xingrun-Xing2/EfficientLLM.

Anthology ID:: 2026.acl-long.355
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7813–7830
Language:
URL:: https://aclanthology.org/2026.acl-long.355/
DOI:
Bibkey:
Cite (ACL):: Xingrun Xing, Zheng Liu, Shitao Xiao, Boyan Gao, Yiming Liang, Haokun Lin, Xianlin Zeng, Guoqi Li, and Jiajun Zhang. 2026. EfficientLLM: Unified Pruning-Aware Pretraining for Auto-Designed Compact Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7813–7830, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: EfficientLLM: Unified Pruning-Aware Pretraining for Auto-Designed Compact Language Models (Xing et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.355.pdf
Checklist:: 2026.acl-long.355.checklist.pdf

PDF Cite Search Checklist Fix data