Improving the Adversarial Robustness of NLP Models by Information Bottleneck

Cenyuan Zhang; Xiang Zhou; Yixin Wan; Xiaoqing Zheng; Kai-Wei Chang; Cho-Jui Hsieh

doi:10.18653/v1/2022.findings-acl.284

Improving the Adversarial Robustness of NLP Models by Information Bottleneck

Cenyuan Zhang, Xiang Zhou, Yixin Wan, Xiaoqing Zheng, Kai-Wei Chang, Cho-Jui Hsieh

Abstract

Existing studies have demonstrated that adversarial examples can be directly attributed to the presence of non-robust features, which are highly predictive, but can be easily manipulated by adversaries to fool NLP models. In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory. Through extensive experiments, we show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy, exceeding performances of all the previously reported defense methods while suffering almost no performance drop in clean accuracy on SST-2, AGNEWS and IMDB datasets.

Anthology ID:: 2022.findings-acl.284
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3588–3598
Language:
URL:: https://aclanthology.org/2022.findings-acl.284/
DOI:: 10.18653/v1/2022.findings-acl.284
Bibkey:
Cite (ACL):: Cenyuan Zhang, Xiang Zhou, Yixin Wan, Xiaoqing Zheng, Kai-Wei Chang, and Cho-Jui Hsieh. 2022. Improving the Adversarial Robustness of NLP Models by Information Bottleneck. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3588–3598, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Improving the Adversarial Robustness of NLP Models by Information Bottleneck (Zhang et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-acl.284.pdf
Software:: 2022.findings-acl.284.software.zip

PDF Cite Search Software Fix data