GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints Joshua Ainslie author James Lee-Thorp author Michiel de Jong author Yury Zemlyanskiy author Federico Lebron author Sumit Sanghai author 2023-12 text Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Houda Bouamor editor Juan Pino editor Kalika Bali editor Association for Computational Linguistics Singapore conference publication ainslie-etal-2023-gqa 10.18653/v1/2023.emnlp-main.298 https://aclanthology.org/2023.emnlp-main.298/ 2023-12 4895 4901