Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism Jiahao Liu author Qifan Wang author Jingang Wang author Xunliang Cai author 2024-08 text Findings of the Association for Computational Linguistics: ACL 2024 Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication liu-etal-2024-speculative-decoding 10.18653/v1/2024.findings-acl.179 https://aclanthology.org/2024.findings-acl.179/ 2024-08 3027 3043