Rethinking Skip Connection with Layer Normalization

Fenglin Liu, Xuancheng Ren, Zhiyuan Zhang, Xu Sun, Yuexian Zou


Abstract
Skip connection is a widely-used technique to improve the performance and the convergence of deep neural networks, which is believed to relieve the difficulty in optimization due to non-linearity by propagating a linear component through the neural network layers. However, from another point of view, it can also be seen as a modulating mechanism between the input and the output, with the input scaled by a pre-defined value one. In this work, we investigate how the scale factors in the effectiveness of the skip connection and reveal that a trivial adjustment of the scale will lead to spurious gradient exploding or vanishing in line with the deepness of the models, which could by addressed by normalization, in particular, layer normalization, which induces consistent improvements over the plain skip connection. Inspired by the findings, we further propose to adaptively adjust the scale of the input by recursively applying skip connection with layer normalization, which promotes the performance substantially and generalizes well across diverse tasks including both machine translation and image classification datasets.
Anthology ID:
2020.coling-main.320
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3586–3598
Language:
URL:
https://aclanthology.org/2020.coling-main.320
DOI:
10.18653/v1/2020.coling-main.320
Bibkey:
Cite (ACL):
Fenglin Liu, Xuancheng Ren, Zhiyuan Zhang, Xu Sun, and Yuexian Zou. 2020. Rethinking Skip Connection with Layer Normalization. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3586–3598, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Rethinking Skip Connection with Layer Normalization (Liu et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.320.pdf
Data
CIFAR-10CIFAR-100