%0 Conference Proceedings %T GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints %A Ainslie, Joshua %A Lee-Thorp, James %A de Jong, Michiel %A Zemlyanskiy, Yury %A Lebron, Federico %A Sanghai, Sumit %Y Bouamor, Houda %Y Pino, Juan %Y Bali, Kalika %S Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing %D 2023 %8 December %I Association for Computational Linguistics %C Singapore %F ainslie-etal-2023-gqa %R 10.18653/v1/2023.emnlp-main.298 %U https://aclanthology.org/2023.emnlp-main.298/ %U https://doi.org/10.18653/v1/2023.emnlp-main.298 %P 4895-4901