QuickLLaMA: Query-aware Inference Acceleration for Large Language Models Jingyao Li author Han Shi author Sitong Wu author Chuanyang Zheng author Zhenguo Li author Xin Jiang author Hong Xu author Jiaya Jia author 2025-01 text Proceedings of the 31st International Conference on Computational Linguistics Owen Rambow editor Leo Wanner editor Marianna Apidianaki editor Hend Al-Khalifa editor Barbara Di Eugenio editor Steven Schockaert editor Association for Computational Linguistics Abu Dhabi, UAE conference publication li-etal-2025-quickllama https://aclanthology.org/2025.coling-main.34/ 2025-01 508 528