VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models

Hanling Zhang; Yayu Zhou; Tongcheng Fang; Zhihang Yuan; Guohao Dai; Wanli Ouyang; Yu Wang

VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models

Hanling Zhang, Yayu Zhou, Tongcheng Fang, Zhihang Yuan, Guohao Dai, Wanli Ouyang, Yu Wang

Abstract

Small Language Models (SLMs) provide computational advantages in resource-constrained environments, yet memory limitations remain a critical bottleneck for edge device deployment. A substantial portion of SLMs’ memory footprint stems from vocabulary-related components, particularly embeddings and language modeling (LM) heads, due to large vocabulary sizes. Existing static vocabulary pruning, while reducing memory usage, suffers from rigid, one-size-fits-all designs that cause information loss during the prefill stage and lack flexibility. In this work, we identify two key principles underlying the vocabulary reduction challenge: the *lexical locality* principle, the observation that only a small subset of tokens is required during any single inference, and the *asymmetry in computational characteristics* between vocabulary-related components of SLM. Based on these insights, we introduce VocabTailor, a novel decoupled dynamic vocabulary selection framework that addresses memory constraints through offloading embedding and implements a hybrid static-dynamic vocabulary selection strategy for LM Head, enabling on-demand loading of vocabulary components. Comprehensive experiments across diverse downstream tasks demonstrate that **VocabTailor** achieves a reduction of up to 99% in the memory usage of vocabulary-related components with minimal or no degradation in task performance, substantially outperforming existing static vocabulary pruning. Our code is available at https://github.com/AwakenedInsects/VocabTailor.

Anthology ID:: 2026.findings-acl.1418
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28453–28471
Language:
URL:: https://aclanthology.org/2026.findings-acl.1418/
DOI:
Bibkey:
Cite (ACL):: Hanling Zhang, Yayu Zhou, Tongcheng Fang, Zhihang Yuan, Guohao Dai, Wanli Ouyang, and Yu Wang. 2026. VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 28453–28471, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models (Zhang et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1418.pdf
Checklist:: 2026.findings-acl.1418.checklist.pdf

PDF Cite Search Checklist Fix data