Chao Zheng


pdf bib
Rethinking Word-level Adversarial Attack: The Trade-off between Efficiency, Effectiveness, and Imperceptibility
Pengwei Zhan | Jing Yang | He Wang | Chao Zheng | Liming Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Neural language models have demonstrated impressive performance in various tasks but remain vulnerable to word-level adversarial attacks. Word-level adversarial attacks can be formulated as a combinatorial optimization problem, and thus, an attack method can be decomposed into search space and search method. Despite the significance of these two components, previous works inadequately distinguish them, which may lead to unfair comparisons and insufficient evaluations. In this paper, to address the inappropriate practices in previous works, we perform thorough ablation studies on the search space, illustrating the substantial influence of search space on attack efficiency, effectiveness, and imperceptibility. Based on the ablation study, we propose two standardized search spaces: the Search Space for ImPerceptibility (SSIP) and Search Space for EffecTiveness (SSET). The reevaluation of eight previous attack methods demonstrates the success of SSIP and SSET in achieving better trade-offs between efficiency, effectiveness, and imperceptibility in different scenarios, offering fair and comprehensive evaluations of previous attack methods and providing potential guidance for future works.


pdf bib
Similarizing the Influence of Words with Contrastive Learning to Defend Word-level Adversarial Text Attack
Pengwei Zhan | Jing Yang | He Wang | Chao Zheng | Xiao Huang | Liming Wang
Findings of the Association for Computational Linguistics: ACL 2023

Neural language models are vulnerable to word-level adversarial text attacks, which generate adversarial examples by directly substituting discrete input words. Previous search methods for word-level attacks assume that the information in the important words is more influential on prediction than unimportant words. In this paper, motivated by this assumption, we propose a self-supervised regularization method for Similarizing the Influence of Words with Contrastive Learning (SIWCon) that encourages the model to learn sentence representations in which words of varying importance have a more uniform influence on prediction. Experiments show that SIWCon is compatible with various training methods and effectively improves model robustness against various unforeseen adversarial attacks. The effectiveness of SIWCon is also intuitively shown through qualitative analysis and visualization of the loss landscape, sentence representation, and changes in model confidence.


pdf bib
PARSE: An Efficient Search Method for Black-box Adversarial Text Attacks
Pengwei Zhan | Chao Zheng | Jing Yang | Yuxiang Wang | Liming Wang | Yang Wu | Yunjian Zhang
Proceedings of the 29th International Conference on Computational Linguistics

Neural networks are vulnerable to adversarial examples. The adversary can successfully attack a model even without knowing model architecture and parameters, i.e., under a black-box scenario. Previous works on word-level attacks widely use word importance ranking (WIR) methods and complex search methods, including greedy search and heuristic algorithms, to find optimal substitutions. However, these methods fail to balance the attack success rate and the cost of attacks, such as the number of queries to the model and the time consumption. In this paper, We propose PAthological woRd Saliency sEarch (PARSE) that performs the search under dynamic search space following the subarea importance. Experiments show that PARSE can achieve comparable attack success rates to complex search methods while saving numerous queries and time, e.g., saving at most 74% of queries and 90% of time compared with greedy search when attacking the examples from Yelp dataset. The adversarial examples crafted by PARSE are also of high quality, highly transferable, and can effectively improve model robustness in adversarial training.