Deep Equilibrium Non-Autoregressive Sequence Learning

Zaixiang Zheng, Yi Zhou, Hao Zhou


Abstract
In this work, we argue that non-autoregressive (NAR) sequence generative models can equivalently be regarded as an iterative refinement process towards the target sequence, implying an underlying dynamical system of NAR model: z = f (z, x) → y. In such a way, the optimal prediction of a NAR model should be the equilibrium state of its dynamics if given infinitely many iterations. However, this is infeasible in practice due to limited computational and memory budgets. To this end, we propose DEQNAR to directly solve for the equilibrium state of NAR models based on deep equilibrium networks (Bai et al., 2019) with black-box root-finding solvers and back-propagate through the equilibrium point via implicit differentiation with constant memory. We conduct extensive experiments on four WMT machine translation benchmarks. Our main findings show that DEQNAR can indeed converge to a more accurate prediction and is a general-purpose framework that consistently helps yield substantial improvement for several strong NAR backbones.
Anthology ID:
2023.findings-acl.747
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11763–11781
Language:
URL:
https://aclanthology.org/2023.findings-acl.747
DOI:
10.18653/v1/2023.findings-acl.747
Bibkey:
Cite (ACL):
Zaixiang Zheng, Yi Zhou, and Hao Zhou. 2023. Deep Equilibrium Non-Autoregressive Sequence Learning. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11763–11781, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Deep Equilibrium Non-Autoregressive Sequence Learning (Zheng et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.747.pdf