Query-Efficient Textual Adversarial Example Generation for Black-Box Attacks

Zhen Yu; Zhenhua Chen; Kun He

Query-Efficient Textual Adversarial Example Generation for Black-Box Attacks

Abstract

Deep neural networks for Natural Language Processing (NLP) have been demonstrated to be vulnerable to textual adversarial examples. Existing black-box attacks typically require thousands of queries on the target model, making them expensive in real-world applications. In this paper, we propose a new approach that guides the word substitutions using prior knowledge from the training set to improve the attack efficiency. Specifically, we introduce Adversarial Boosting Preference (ABP), a metric that quantifies the importance of words and guides adversarial word substitutions. We then propose two query-efficient attack strategies based on ABP: query-free attack (ABP_free) and guided search attack (ABP_guide). Extensive evaluations for text classification demonstrate that ABP_free generates more natural adversarial examples than existing universal attacks, ABP_guide significantly reduces the number of queries by a factor of 10 500 while achieving comparable or even better performance than black-box attack baselines. Furthermore, we introduce the first ensemble attack ABP_ens in NLP, which gains further performance improvements and achieves better transferability and generalization by the ensemble of the ABP across different models and domains. Code is available at https://github.com/BaiDingHub/ABP.

Anthology ID:: 2024.naacl-long.31
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 556–569
Language:
URL:: https://aclanthology.org/2024.naacl-long.31
DOI:
Bibkey:
Cite (ACL):: Zhen Yu, Zhenhua Chen, and Kun He. 2024. Query-Efficient Textual Adversarial Example Generation for Black-Box Attacks. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 556–569, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Query-Efficient Textual Adversarial Example Generation for Black-Box Attacks (Yu et al., NAACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.naacl-long.31.pdf

PDF Cite Search