PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference Dongjie Yang author Xiaodong Han author Yan Gao author Yao Hu author Shilin Zhang author Hai Zhao author 2024-08 text Findings of the Association for Computational Linguistics: ACL 2024 Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication yang-etal-2024-pyramidinfer 10.18653/v1/2024.findings-acl.195 https://aclanthology.org/2024.findings-acl.195/ 2024-08 3258 3270