%0 Conference Proceedings
%T Reservoir Transformers
%A Shen, Sheng
%A Baevski, Alexei
%A Morcos, Ari
%A Keutzer, Kurt
%A Auli, Michael
%A Kiela, Douwe
%Y Zong, Chengqing
%Y Xia, Fei
%Y Li, Wenjie
%Y Navigli, Roberto
%S Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
%D 2021
%8 August
%I Association for Computational Linguistics
%C Online
%F shen-etal-2021-reservoir
%X We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear “reservoir” layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks.
%R 10.18653/v1/2021.acl-long.331
%U https://aclanthology.org/2021.acl-long.331
%U https://doi.org/10.18653/v1/2021.acl-long.331
%P 4294-4309