Query-Efficient Textual Adversarial Example Generation for Black-Box Attacks

Zhen Yu, Zhenhua Chen, Kun He


Abstract
Deep neural networks for Natural Language Processing (NLP) have been demonstrated to be vulnerable to textual adversarial examples. Existing black-box attacks typically require thousands of queries on the target model, making them expensive in real-world applications. In this paper, we propose a new approach that guides the word substitutions using prior knowledge from the training set to improve the attack efficiency. Specifically, we introduce Adversarial Boosting Preference (ABP), a metric that quantifies the importance of words and guides adversarial word substitutions. We then propose two query-efficient attack strategies based on ABP: query-free attack (ABPfree) and guided search attack (ABPguide). Extensive evaluations for text classification demonstrate that ABPfree generates more natural adversarial examples than existing universal attacks, ABPguide significantly reduces the number of queries by a factor of 10 500 while achieving comparable or even better performance than black-box attack baselines. Furthermore, we introduce the first ensemble attack ABPens in NLP, which gains further performance improvements and achieves better transferability and generalization by the ensemble of the ABP across different models and domains. Code is available at https://github.com/BaiDingHub/ABP.
Anthology ID:
2024.naacl-long.31
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
556–569
Language:
URL:
https://aclanthology.org/2024.naacl-long.31
DOI:
Bibkey:
Cite (ACL):
Zhen Yu, Zhenhua Chen, and Kun He. 2024. Query-Efficient Textual Adversarial Example Generation for Black-Box Attacks. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 556–569, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Query-Efficient Textual Adversarial Example Generation for Black-Box Attacks (Yu et al., NAACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.naacl-long.31.pdf
Copyright:
 2024.naacl-long.31.copyright.pdf