CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers

CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers Longwei Zou author Qingyang Wang author Han Zhao author Jiangang Kong author Yi Yang author Yangdong Deng author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication zou-etal-2024-cqil 10.18653/v1/2024.acl-long.394 https://aclanthology.org/2024.acl-long.394/ 2024-08 7293 7307