Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent William Merrill author Vivek Ramanujan author Yoav Goldberg author Roy Schwartz author Noah A Smith author 2021-11 text Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing Marie-Francine Moens editor Xuanjing Huang editor Lucia Specia editor Scott Wen-tau Yih editor Association for Computational Linguistics Online and Punta Cana, Dominican Republic conference publication merrill-etal-2021-effects 10.18653/v1/2021.emnlp-main.133 https://aclanthology.org/2021.emnlp-main.133/ 2021-11 1766 1781