%0 Conference Proceedings
%T Gradient-based Adversarial Attacks against Text Transformers
%A Guo, Chuan
%A Sablayrolles, Alexandre
%A Jégou, Hervé
%A Kiela, Douwe
%Y Moens, Marie-Francine
%Y Huang, Xuanjing
%Y Specia, Lucia
%Y Yih, Scott Wen-tau
%S Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
%D 2021
%8 November
%I Association for Computational Linguistics
%C Online and Punta Cana, Dominican Republic
%F guo-etal-2021-gradient
%X We propose the first general-purpose gradient-based adversarial attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enabling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks, outperforming prior work in terms of adversarial success rate with matching imperceptibility as per automated and human evaluation. Furthermore, we show that a powerful black-box transfer attack, enabled by sampling from the adversarial distribution, matches or exceeds existing methods, while only requiring hard-label outputs.
%R 10.18653/v1/2021.emnlp-main.464
%U https://aclanthology.org/2021.emnlp-main.464
%U https://doi.org/10.18653/v1/2021.emnlp-main.464
%P 5747-5757