What’s in Your Head? Emergent Behaviour in Multi-Task Transformer Models

Mor Geva; Uri Katz; Aviv Ben-Arie; Jonathan Berant

doi:10.18653/v1/2021.emnlp-main.646

What’s in Your Head? Emergent Behaviour in Multi-Task Transformer Models

Mor Geva, Uri Katz, Aviv Ben-Arie, Jonathan Berant

Abstract

The primary paradigm for multi-task training in natural language processing is to represent the input with a shared pre-trained language model, and add a small, thin network (head) per task. Given an input, a target head is the head that is selected for outputting the final prediction. In this work, we examine the behaviour of non-target heads, that is, the output of heads when given input that belongs to a different task than the one they were trained for. We find that non-target heads exhibit emergent behaviour, which may either explain the target task, or generalize beyond their original task. For example, in a numerical reasoning task, a span extraction head extracts from the input the arguments to a computation that results in a number generated by a target generative head. In addition, a summarization head that is trained with a target question answering head, outputs query-based summaries when given a question and a context from which the answer is to be extracted. This emergent behaviour suggests that multi-task training leads to non-trivial extrapolation of skills, which can be harnessed for interpretability and generalization.

Anthology ID:: 2021.emnlp-main.646
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8201–8215
Language:
URL:: https://aclanthology.org/2021.emnlp-main.646
DOI:: 10.18653/v1/2021.emnlp-main.646
Bibkey:
Cite (ACL):: Mor Geva, Uri Katz, Aviv Ben-Arie, and Jonathan Berant. 2021. What’s in Your Head? Emergent Behaviour in Multi-Task Transformer Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8201–8215, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: What’s in Your Head? Emergent Behaviour in Multi-Task Transformer Models (Geva et al., EMNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.emnlp-main.646.pdf
Video:: https://aclanthology.org/2021.emnlp-main.646.mp4
Data: DROP

PDF Cite Search Video