BITE: Textual Backdoor Attacks with Iterative Trigger Injection

Jun Yan, Vansh Gupta, Xiang Ren


Abstract
Backdoor attacks have become an emerging threat to NLP systems. By providing poisoned training data, the adversary can embed a “backdoor” into the victim model, which allows input instances satisfying certain textual patterns (e.g., containing a keyword) to be predicted as a target label of the adversary’s choice. In this paper, we demonstrate that it is possible to design a backdoor attack that is both stealthy (i.e., hard to notice) and effective (i.e., has a high attack success rate). We propose BITE, a backdoor attack that poisons the training data to establish strong correlations between the target label and a set of “trigger words”. These trigger words are iteratively identified and injected into the target-label instances through natural word-level perturbations. The poisoned training data instruct the victim model to predict the target label on inputs containing trigger words, forming the backdoor. Experiments on four text classification datasets show that our proposed attack is significantly more effective than baseline methods while maintaining decent stealthiness, raising alarm on the usage of untrusted training data. We further propose a defense method named DeBITE based on potential trigger word removal, which outperforms existing methods in defending against BITE and generalizes well to handling other backdoor attacks.
Anthology ID:
2023.acl-long.725
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12951–12968
Language:
URL:
https://aclanthology.org/2023.acl-long.725
DOI:
10.18653/v1/2023.acl-long.725
Bibkey:
Cite (ACL):
Jun Yan, Vansh Gupta, and Xiang Ren. 2023. BITE: Textual Backdoor Attacks with Iterative Trigger Injection. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12951–12968, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
BITE: Textual Backdoor Attacks with Iterative Trigger Injection (Yan et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.725.pdf
Video:
 https://aclanthology.org/2023.acl-long.725.mp4