Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding Jun Zhang author Jue Wang author Huan Li author Lidan Shou author Ke Chen author Gang Chen author Sharad Mehrotra author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication zhang-etal-2024-draft 10.18653/v1/2024.acl-long.607 https://aclanthology.org/2024.acl-long.607/ 2024-08 11263 11282