Reducing Token Redundancy in LVLMs: A Systematic Review of Token Pruning Methods

Hanzhang Yuan; Mengxuan Hu; Wenhao Zhang; Tianlong Wang; Zhongliang Zhou; Jiasen Lu; Sheng Li

Reducing Token Redundancy in LVLMs: A Systematic Review of Token Pruning Methods

Hanzhang Yuan, Mengxuan Hu, Wenhao Zhang, Tianlong Wang, Zhongliang Zhou, Jiasen Lu, Sheng Li

Abstract

Large Vision-Language Models (LVLMs) excel at visual understanding but face severe computational bottlenecks when processing high-resolution images and long videos due to massive visual token counts. Token pruning mitigates this by selectively removing less informative tokens while maintaining performance. However, existing methods vary widely in pruning location (vision encoder vs. LLM decoder), importance criteria (attention vs. similarity vs. learned scores), and application strategy, lacking systematic comparison. This survey presents the first comprehensive review of token pruning for LVLMs. We propose a taxonomy categorizing methods into vision-side, LLM-side, and hybrid paradigms, systematically analyze token selection mechanisms and pruning strategy. We further discuss evaluation protocols and identify key challenges including prompt-adaptive pruning and hardware-aware design. Our survey provides a structured foundation for this rapidly growing research area.

Anthology ID:: 2026.acl-long.328
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7231–7251
Language:
URL:: https://aclanthology.org/2026.acl-long.328/
DOI:
Bibkey:
Cite (ACL):: Hanzhang Yuan, Mengxuan Hu, Wenhao Zhang, Tianlong Wang, Zhongliang Zhou, Jiasen Lu, and Sheng Li. 2026. Reducing Token Redundancy in LVLMs: A Systematic Review of Token Pruning Methods. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7231–7251, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Reducing Token Redundancy in LVLMs: A Systematic Review of Token Pruning Methods (Yuan et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.328.pdf
Checklist:: 2026.acl-long.328.checklist.pdf

PDF Cite Search Checklist Fix data