TAGQuant: Token-Aware Clustering for Group-Wise Quantization

Jaeseong Lee; Seung-won Hwang; Aurick Qiao; Zhewei Yao; Yuxiong He

TAGQuant: Token-Aware Clustering for Group-Wise Quantization

Jaeseong Lee, Seung-won Hwang, Aurick Qiao, Zhewei Yao, Yuxiong He

Abstract

Grouping, e.g., grouping channels, which is widely used in current integer-based quantization, has become essential for the emerging MXFP4 format. Ideally, each group should contain channels with similar quantization scales. To guide such groups, existing work clusters the channels using scalar proxy, ignoring the token dimension, which we find suboptimal. In this paper, we propose TAGQuant, a simple yet powerful enhancement for such “group-wise” quantization. By strategically shuffling channels to group those with similar token-wise activation distributions, TAGQuant ensures better clustering of large- and small-range values. This shuffle operation is hardware-efficient, and seamlessly integrated into the quantization process with only 0.01x latency overhead. TAGQuant reduces relative GSM8K error in both INT4 and MXFP4 formats, by up to 86% in Llama-3.1-8B-Instruct compared to baselines, validating the effectiveness of our channel shuffling approach for group-wise quantization. Code is publicly available.

Anthology ID:: 2026.eacl-industry.18
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Yevgen Matusevych, Gülşen Eryiğit, Nikolaos Aletras
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 253–262
Language:
URL:: https://aclanthology.org/2026.eacl-industry.18/
DOI:
Bibkey:
Cite (ACL):: Jaeseong Lee, Seung-won Hwang, Aurick Qiao, Zhewei Yao, and Yuxiong He. 2026. TAGQuant: Token-Aware Clustering for Group-Wise Quantization. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track), pages 253–262, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: TAGQuant: Token-Aware Clustering for Group-Wise Quantization (Lee et al., EACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.eacl-industry.18.pdf

PDF Cite Search Fix data