Scheduled DropHead: A Regularization Method for Transformer Models

Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou, Ke Xu


Abstract
We introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism which is a key component of transformer. In contrast to the conventional dropout mechanism which randomly drops units or connections, DropHead drops entire attention heads during training to prevent the multi-head attention model from being dominated by a small portion of attention heads. It can help reduce the risk of overfitting and allow the models to better benefit from the multi-head attention. Given the interaction between multi-headedness and training dynamics, we further propose a novel dropout rate scheduler to adjust the dropout rate of DropHead throughout training, which results in a better regularization effect. Experimental results demonstrate that our proposed approach can improve transformer models by 0.9 BLEU score on WMT14 En-De translation task and around 1.0 accuracy for various text classification tasks.
Anthology ID:
2020.findings-emnlp.178
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1971–1980
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.178
DOI:
10.18653/v1/2020.findings-emnlp.178
Bibkey:
Cite (ACL):
Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou, and Ke Xu. 2020. Scheduled DropHead: A Regularization Method for Transformer Models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1971–1980, Online. Association for Computational Linguistics.
Cite (Informal):
Scheduled DropHead: A Regularization Method for Transformer Models (Zhou et al., Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.178.pdf
Data
IMDb Movie ReviewsSNLIYahoo! Answers