%0 Conference Proceedings %T Reservoir Transformers %A Shen, Sheng %A Baevski, Alexei %A Morcos, Ari %A Keutzer, Kurt %A Auli, Michael %A Kiela, Douwe %Y Zong, Chengqing %Y Xia, Fei %Y Li, Wenjie %Y Navigli, Roberto %S Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) %D 2021 %8 August %I Association for Computational Linguistics %C Online %F shen-etal-2021-reservoir %X We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear “reservoir” layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks. %R 10.18653/v1/2021.acl-long.331 %U https://aclanthology.org/2021.acl-long.331 %U https://doi.org/10.18653/v1/2021.acl-long.331 %P 4294-4309