Where to Attack: A Dynamic Locator Model for Backdoor Attack in Text Classifications

Heng-yang Lu, Chenyou Fan, Jun Yang, Cong Hu, Wei Fang, Xiao-jun Wu


Abstract
Nowadays, deep-learning based NLP models are usually trained with large-scale third-party data which can be easily injected with malicious backdoors. Thus, BackDoor Attack (BDA) study has become a trending research to help promote the robustness of an NLP system. Text-based BDA aims to train a poisoned model with both clean and poisoned texts to perform normally on clean inputs while being misled to predict those trigger-embedded texts as target labels set by attackers. Previous works usually choose fixed Positions-to-Poison (P2P) first, then add triggers upon those positions such as letter insertion or deletion. However, considering the positions of words with important semantics may vary in different contexts, fixed P2P models are severely limited in flexibility and performance. We study the text-based BDA from the perspective of automatically and dynamically selecting P2P from contexts. We design a novel Locator model which can predict P2P dynamically without human intervention. Based on the predicted P2P, four effective strategies are introduced to show the BDA performance. Experiments on two public datasets show both tinier test accuracy gap on clean data and higher attack success rate on poisoned ones. Human evaluation with volunteers also shows the P2P predicted by our model are important for classification. Source code is available at https://github.com/jncsnlp/LocatorModel
Anthology ID:
2022.coling-1.82
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
984–993
Language:
URL:
https://aclanthology.org/2022.coling-1.82
DOI:
Bibkey:
Cite (ACL):
Heng-yang Lu, Chenyou Fan, Jun Yang, Cong Hu, Wei Fang, and Xiao-jun Wu. 2022. Where to Attack: A Dynamic Locator Model for Backdoor Attack in Text Classifications. In Proceedings of the 29th International Conference on Computational Linguistics, pages 984–993, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Where to Attack: A Dynamic Locator Model for Backdoor Attack in Text Classifications (Lu et al., COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.82.pdf
Code
 jncsnlp/locatormodel