Prediction Difference Regularization against Perturbation for Neural Machine Translation

Dengji Guo, Zhengrui Ma, Min Zhang, Yang Feng


Abstract
Regularization methods applying input perturbation have drawn considerable attention and have been frequently explored for NMT tasks in recent years. Despite their simplicity and effectiveness, we argue that these methods are limited by the under-fitting of training data. In this paper, we utilize prediction difference for ground-truth tokens to analyze the fitting of token-level samples and find that under-fitting is almost as common as over-fitting. We introduce prediction difference regularization (PD-R), a simple and effective method that can reduce over-fitting and under-fitting at the same time. For all token-level samples, PD-R minimizes the prediction difference between the original pass and the input-perturbed pass, making the model less sensitive to small input changes, thus more robust to both perturbations and under-fitted training data. Experiments on three widely used WMT translation tasks show that our approach can significantly improve over existing perturbation regularization methods. On WMT16 En-De task, our model achieves 1.80 SacreBLEU improvement over vanilla transformer.
Anthology ID:
2022.acl-long.528
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7665–7675
Language:
URL:
https://aclanthology.org/2022.acl-long.528
DOI:
10.18653/v1/2022.acl-long.528
Bibkey:
Cite (ACL):
Dengji Guo, Zhengrui Ma, Min Zhang, and Yang Feng. 2022. Prediction Difference Regularization against Perturbation for Neural Machine Translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7665–7675, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Prediction Difference Regularization against Perturbation for Neural Machine Translation (Guo et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.528.pdf
Video:
 https://aclanthology.org/2022.acl-long.528.mp4