Hierarchical Speculative Decoding with Dynamic Window

Shensian Syu; Hung-yi Lee

doi:10.18653/v1/2025.findings-naacl.462

Hierarchical Speculative Decoding with Dynamic Window

Abstract

Speculative Decoding (SD) utilizes an efficient draft model to generate multiple tokens, which are subsequently verified in parallel by a target model. This approach has shown significant potential for accelerating inference in large language models (LLMs), with performance heavily reliant on the hyperparameter K—the window size. However, previous methods often depend on simple heuristics to select K or dynamically adjust the window size, which may necessitate additional training or careful resource management to avoid competition.To address these challenges, we propose Hierarchical Speculative Decoding with Dynamic Window (HSDDW), a straightforward framework that eliminates the need for additional training. Specifically, we introduce a self-verify mechanism that enables the draft model to autonomously decide when to stop generating tokens. Additionally, by integrating a hierarchical structure that leverages the capabilities of models of different sizes, we significantly enhance the overall speed of the system.HSDDW demonstrates competitive performance across four datasets, achieving notable speedups of 2.91× on MT-Bench and 2.99× on Alpaca, outperforming existing state-of-the-art methods.

Anthology ID:: 2025.findings-naacl.462
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8275–8288
Language:
URL:: https://aclanthology.org/2025.findings-naacl.462/
DOI:: 10.18653/v1/2025.findings-naacl.462
Bibkey:
Cite (ACL):: Shensian Syu and Hung-yi Lee. 2025. Hierarchical Speculative Decoding with Dynamic Window. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 8275–8288, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Hierarchical Speculative Decoding with Dynamic Window (Syu & Lee, Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.462.pdf

PDF Cite Search Fix data