Finding the Pillars of Strength for Multi-Head Attention

Jinjie Ni, Rui Mao, Zonglin Yang, Han Lei, Erik Cambria


Abstract
Recent studies have revealed some issues of Multi-Head Attention (MHA), e.g., redundancy and over-parameterization. Specifically, the heads of MHA were originally designed to attend to information from different representation subspaces, whereas prior studies found that some attention heads likely learn similar features and can be pruned without harming performance. Inspired by the minimum-redundancy feature selection, we assume that focusing on the most representative and distinctive features with minimum resources can mitigate the above issues and lead to more effective and efficient MHAs. In particular, we propose Grouped Head Attention, trained with a self-supervised group constraint that group attention heads, where each group focuses on an essential but distinctive feature subset. We additionally propose a Voting-to-Stay procedure to remove redundant heads, thus achieving a transformer with lighter weights. Extensive experiments are consistent with our hypothesis. Moreover, our method achieves significant performance gains on three well-established tasks while considerably compressing parameters.
Anthology ID:
2023.acl-long.812
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14526–14540
Language:
URL:
https://aclanthology.org/2023.acl-long.812
DOI:
10.18653/v1/2023.acl-long.812
Bibkey:
Cite (ACL):
Jinjie Ni, Rui Mao, Zonglin Yang, Han Lei, and Erik Cambria. 2023. Finding the Pillars of Strength for Multi-Head Attention. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14526–14540, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Finding the Pillars of Strength for Multi-Head Attention (Ni et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.812.pdf
Video:
 https://aclanthology.org/2023.acl-long.812.mp4