ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

Bei Li, Quan Du, Tao Zhou, Yi Jing, Shuhan Zhou, Xin Zeng, Tong Xiao, JingBo Zhu, Xuebo Liu, Min Zhang


Abstract
Residual networks are an Euler discretization of solutions to Ordinary Differential Equations (ODE). This paper explores a deeper relationship between Transformer and numerical ODE methods. We first show that a residual block of layers in Transformer can be described as a higher-order solution to ODE. Inspired by this, we design a new architecture, ODE Transformer, which is analogous to the Runge-Kutta method that is well motivated in ODE. As a natural extension to Transformer, ODE Transformer is easy to implement and efficient to use. Experimental results on the large-scale machine translation, abstractive summarization, and grammar error correction tasks demonstrate the high genericity of ODE Transformer. It can gain large improvements in model performance over strong baselines (e.g., 30.77 and 44.11 BLEU scores on the WMT’14 English-German and English-French benchmarks) at a slight cost in inference efficiency.
Anthology ID:
2022.acl-long.571
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8335–8351
Language:
URL:
https://aclanthology.org/2022.acl-long.571
DOI:
10.18653/v1/2022.acl-long.571
Bibkey:
Cite (ACL):
Bei Li, Quan Du, Tao Zhou, Yi Jing, Shuhan Zhou, Xin Zeng, Tong Xiao, JingBo Zhu, Xuebo Liu, and Min Zhang. 2022. ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8335–8351, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation (Li et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.571.pdf
Software:
 2022.acl-long.571.software.zip
Code
 libeineu/ode-transformer