Yijin Yang

2024

pdf bib abs
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger
Jiazhao Li | Yijin Yang | Zhuofeng Wu | V.G.Vinod Vydiswaran | Chaowei Xiao
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Textual backdoor attacks, characterized by subtle manipulations of input triggers and training dataset labels, pose significant threats to security-sensitive applications. The rise of advanced generative models, such as GPT-4, with their capacity for human-like rewriting, makes these attacks increasingly challenging to detect. In this study, we conduct an in-depth examination of black-box generative models as tools for backdoor attacks, thereby emphasizing the need for effective defense strategies. We propose BGMAttack, a novel framework that harnesses advanced generative models to execute stealthier backdoor attacks on text classifiers. Unlike prior approaches constrained by subpar generation quality, BGMAttack renders backdoor triggers more elusive to human cognition and advanced machine detection. A rigorous evaluation of attack effectiveness over four sentiment classification tasks, complemented by four human cognition stealthiness tests, reveals BGMAttack’s superior performance, achieving a state-of-the-art attack success rate of 97.35% on average while maintaining superior stealth compared to conventional methods. The dataset and code are available: https://github.com/JiazhaoLi/BGMAttack.

Co-authors

Venues

naacl1

Fix data