PARSE: An Efficient Search Method for Black-box Adversarial Text Attacks

Pengwei Zhan, Chao Zheng, Jing Yang, Yuxiang Wang, Liming Wang, Yang Wu, Yunjian Zhang


Abstract
Neural networks are vulnerable to adversarial examples. The adversary can successfully attack a model even without knowing model architecture and parameters, i.e., under a black-box scenario. Previous works on word-level attacks widely use word importance ranking (WIR) methods and complex search methods, including greedy search and heuristic algorithms, to find optimal substitutions. However, these methods fail to balance the attack success rate and the cost of attacks, such as the number of queries to the model and the time consumption. In this paper, We propose PAthological woRd Saliency sEarch (PARSE) that performs the search under dynamic search space following the subarea importance. Experiments show that PARSE can achieve comparable attack success rates to complex search methods while saving numerous queries and time, e.g., saving at most 74% of queries and 90% of time compared with greedy search when attacking the examples from Yelp dataset. The adversarial examples crafted by PARSE are also of high quality, highly transferable, and can effectively improve model robustness in adversarial training.
Anthology ID:
2022.coling-1.423
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Editors:
Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4776–4787
Language:
URL:
https://aclanthology.org/2022.coling-1.423
DOI:
Bibkey:
Cite (ACL):
Pengwei Zhan, Chao Zheng, Jing Yang, Yuxiang Wang, Liming Wang, Yang Wu, and Yunjian Zhang. 2022. PARSE: An Efficient Search Method for Black-box Adversarial Text Attacks. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4776–4787, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
PARSE: An Efficient Search Method for Black-box Adversarial Text Attacks (Zhan et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.423.pdf
Data
Yelp Review Polarity