Mixture of Attention Heads: Selecting Attention Heads Per Token Xiaofeng Zhang author Yikang Shen author Zeyu Huang author Jie Zhou author Wenge Rong author Zhang Xiong author 2022-12 text Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing Yoav Goldberg editor Zornitsa Kozareva editor Yue Zhang editor Association for Computational Linguistics Abu Dhabi, United Arab Emirates conference publication zhang-etal-2022-mixture 10.18653/v1/2022.emnlp-main.278 https://aclanthology.org/2022.emnlp-main.278/ 2022-12 4150 4162