Exploring Robust Overfitting for Pre-trained Language Models

Bin Zhu, Yanghui Rao


Abstract
We identify the robust overfitting issue for pre-trained language models by showing that the robust test loss increases as the epoch grows. Through comprehensive exploration of the robust loss on the training set, we attribute robust overfitting to the model’s memorization of the adversarial training data. We attempt to mitigate robust overfitting by combining regularization methods with adversarial training. Following the philosophy that prevents the model from memorizing the adversarial data, we find that flooding, a regularization method with loss scaling, can mitigate robust overfitting for pre-trained language models. Eventually, we investigate the effect of flooding levels and evaluate the models’ adversarial robustness under textual attacks. Extensive experiments demonstrate that our methods can mitigate robust overfitting upon three top adversarial training methods and further promote adversarial robustness.
Anthology ID:
2023.findings-acl.340
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5506–5522
Language:
URL:
https://aclanthology.org/2023.findings-acl.340
DOI:
10.18653/v1/2023.findings-acl.340
Bibkey:
Cite (ACL):
Bin Zhu and Yanghui Rao. 2023. Exploring Robust Overfitting for Pre-trained Language Models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5506–5522, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Exploring Robust Overfitting for Pre-trained Language Models (Zhu & Rao, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.340.pdf