Improving Non-Autoregressive Neural Machine Translation via Modeling Localness

Yong Wang, Xinwei Geng


Abstract
Non-autoregressive translation (NAT) models, which eliminate the sequential dependencies within the target sentence, have achieved remarkable inference speed, but suffer from inferior translation quality. Towards exploring the underlying causes, we carry out a thorough preliminary study on the attention mechanism, which demonstrates the serious weakness in capturing localness compared with conventional autoregressive translation (AT). In response to this problem, we propose to improve the localness of NAT models by explicitly introducing the information about surrounding words. Specifically, temporal convolutions are incorporated into both encoder and decoder sides to obtain localness-aware representations. Extensive experiments on several typical translation datasets show that the proposed method can achieve consistent and significant improvements over strong NAT baselines. Further analyses on the WMT14 En-De translation task reveal that compared with baselines, our approach accelerates the convergence in training and can achieve equivalent performance with a reduction of 70% training steps.
Anthology ID:
2022.coling-1.463
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5217–5226
Language:
URL:
https://aclanthology.org/2022.coling-1.463
DOI:
Bibkey:
Cite (ACL):
Yong Wang and Xinwei Geng. 2022. Improving Non-Autoregressive Neural Machine Translation via Modeling Localness. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5217–5226, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Improving Non-Autoregressive Neural Machine Translation via Modeling Localness (Wang & Geng, COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.463.pdf