A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning

Naibin Gu, Peng Fu, Xiyu Liu, Zhengxiao Liu, Zheng Lin, Weiping Wang


Abstract
Parameter-Efficient Tuning (PET) has shown remarkable performance by fine-tuning only a small number of parameters of the pre-trained language models (PLMs) for the downstream tasks, while it is also possible to construct backdoor attacks due to the vulnerability of pre-trained weights. However, a large reduction in the number of attackable parameters in PET will cause the user’s fine-tuning to greatly affect the effectiveness of backdoor attacks, resulting in backdoor forgetting. We find that the backdoor injection process can be regarded as multi-task learning, which has a convergence imbalance problem between the training of clean and poisoned data. And this problem might result in forgetting the backdoor. Based on this finding, we propose a gradient control method to consolidate the attack effect, comprising two strategies. One controls the gradient magnitude distribution cross layers within one task and the other prevents the conflict of gradient directions between tasks. Compared with previous backdoor attack methods in the scenario of PET, our method improve the effect of the attack on sentiment classification and spam detection respectively, which shows that our method is widely applicable to different tasks.
Anthology ID:
2023.acl-long.194
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3508–3520
Language:
URL:
https://aclanthology.org/2023.acl-long.194
DOI:
10.18653/v1/2023.acl-long.194
Bibkey:
Cite (ACL):
Naibin Gu, Peng Fu, Xiyu Liu, Zhengxiao Liu, Zheng Lin, and Weiping Wang. 2023. A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3508–3520, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
A Gradient Control Method for Backdoor Attacks on Parameter-Efficient Tuning (Gu et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.194.pdf
Video:
 https://aclanthology.org/2023.acl-long.194.mp4