Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning

Chong Li, Shaonan Wang, Yunhao Zhang, Jiajun Zhang, Chengqing Zong


Abstract
Transformer-based models, even though achieving super-human performance on several downstream tasks, are often regarded as a black box and used as a whole. It is still unclear what mechanisms they have learned, especially their core module: multi-head attention. Inspired by functional specialization in the human brain, which helps to efficiently handle multiple tasks, this work attempts to figure out whether the multi-head attention module will evolve similar function separation under multi-tasking training. If it is, can this mechanism further improve the model performance? To investigate these questions, we introduce an interpreting method to quantify the degree of functional specialization in multi-head attention. We further propose a simple multi-task training method to increase functional specialization and mitigate negative information transfer in multi-task learning. Experimental results on seven pre-trained transformer models have demonstrated that multi-head attention does evolve functional specialization phenomenon after multi-task training which is affected by the similarity of tasks. Moreover, the multi-task training strategy based on functional specialization boosts performance in both multi-task learning and transfer learning without adding any parameters.
Anthology ID:
2023.emnlp-main.1026
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16460–16476
Language:
URL:
https://aclanthology.org/2023.emnlp-main.1026
DOI:
10.18653/v1/2023.emnlp-main.1026
Bibkey:
Cite (ACL):
Chong Li, Shaonan Wang, Yunhao Zhang, Jiajun Zhang, and Chengqing Zong. 2023. Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16460–16476, Singapore. Association for Computational Linguistics.
Cite (Informal):
Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning (Li et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.1026.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.1026.mp4