Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models

He Xiao; Qingyao Yang; Dirui Xie; Wendong XU; Zunhai Su; Runming Yang; Haobo Liu; Wenyong Zhou; Zhengwu Liu; Ngai Wong

Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models

He Xiao, Qingyao Yang, Dirui Xie, Wendong XU, Zunhai Su, Runming Yang, Haobo Liu, Wenyong Zhou, Zhengwu Liu, Ngai Wong

Abstract

Large language models with billions of parameters are often over-provisioned: many layers contribute little unique information yet dominate the memory and energy footprint during inference. We present LieQ (Layer-wise information effectiveness Quantization), a hardware-native, metric-driven post-training quantization framework that addresses the critical challenge of maintaining accuracy in sub-8B models, model parameters less than 8B, under extreme low-bit compression. LieQ keeps uniform bit-width within each layer while mixing precision across layers, preserving standard multiplication kernels and avoiding irregular memory access, codebooks, or irregular formats at inference time. Our method uncovers a strong correlation between layer-wise functional saliency and representational compactness, revealing that layers with higher training-induced energy concentration are functionally irreplaceable. Leveraging this insight, we propose a purely geometry-driven sensitivity proxy that enables automatic bit-width allocation under a target average-bit budget without expensive gradient updates or inference-based perplexity probing. Under an average weight bit-width approaching two bits per parameter, LieQ consistently reduces the large accuracy gap typically observed for naive uniform 2-bit baselines on Qwen3 and LLaMA3.x families, while retaining standard-kernel efficiency. These properties make LieQ a practical path toward deploying small language models on resource-constrained edge devices. Code will be available at: https://github.com/HeXiao-55/LieQ-official.git.

Anthology ID:: 2026.findings-acl.771
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15753–15764
Language:
URL:: https://aclanthology.org/2026.findings-acl.771/
DOI:
Bibkey:
Cite (ACL):: He Xiao, Qingyao Yang, Dirui Xie, Wendong XU, Zunhai Su, Runming Yang, Haobo Liu, Wenyong Zhou, Zhengwu Liu, and Ngai Wong. 2026. Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 15753–15764, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models (Xiao et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.771.pdf
Checklist:: 2026.findings-acl.771.checklist.pdf

PDF Cite Search Checklist Fix data