On Orthogonality Constraints for Transformers

Aston Zhang; Alvin Chan; Yi Tay; Jie Fu; Shuohang Wang; Shuai Zhang; Huajie Shao; Shuochao Yao; Roy Ka-Wei Lee

doi:10.18653/v1/2021.acl-short.48

On Orthogonality Constraints for Transformers

Aston Zhang, Alvin Chan, Yi Tay, Jie Fu, Shuohang Wang, Shuai Zhang, Huajie Shao, Shuochao Yao, Roy Ka-Wei Lee

Abstract

Orthogonality constraints encourage matrices to be orthogonal for numerical stability. These plug-and-play constraints, which can be conveniently incorporated into model training, have been studied for popular architectures in natural language processing, such as convolutional neural networks and recurrent neural networks. However, a dedicated study on such constraints for transformers has been absent. To fill this gap, this paper studies orthogonality constraints for transformers, showing the effectiveness with empirical evidence from ten machine translation tasks and two dialogue generation tasks. For example, on the large-scale WMT’16 En→De benchmark, simply plugging-and-playing orthogonality constraints on the original transformer model (Vaswani et al., 2017) increases the BLEU from 28.4 to 29.6, coming close to the 29.7 BLEU achieved by the very competitive dynamic convolution (Wu et al., 2019).

Anthology ID:: 2021.acl-short.48
Volume:: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Month:: August
Year:: 2021
Address:: Online
Editors:: Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:: ACL | IJCNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 375–382
Language:
URL:: https://aclanthology.org/2021.acl-short.48/
DOI:: 10.18653/v1/2021.acl-short.48
Bibkey:
Cite (ACL):: Aston Zhang, Alvin Chan, Yi Tay, Jie Fu, Shuohang Wang, Shuai Zhang, Huajie Shao, Shuochao Yao, and Roy Ka-Wei Lee. 2021. On Orthogonality Constraints for Transformers. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 375–382, Online. Association for Computational Linguistics.
Cite (Informal):: On Orthogonality Constraints for Transformers (Zhang et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.acl-short.48.pdf
Video:: https://aclanthology.org/2021.acl-short.48.mp4

PDF Cite Search Video Fix data