FlattenQuant: Breaking through the Inference Compute-bound for Large Language Models with Per-tensor Quantization Yi Zhang author Fei Yang author Shuang Peng author Fangyu Wang author Aimin Pan author 2024-05 text Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) Nicoletta Calzolari editor Min-Yen Kan editor Veronique Hoste editor Alessandro Lenci editor Sakriani Sakti editor Nianwen Xue editor ELRA and ICCL Torino, Italia conference publication zhang-etal-2024-flattenquant https://aclanthology.org/2024.lrec-main.648/ 2024-05 7356 7365