Backdooring Neural Code Search

Weisong Sun; Yuchen Chen; Guanhong Tao; Chunrong Fang; Xiangyu Zhang; Quanjun Zhang; Bin Luo

doi:10.18653/v1/2023.acl-long.540

Backdooring Neural Code Search

Weisong Sun, Yuchen Chen, Guanhong Tao, Chunrong Fang, Xiangyu Zhang, Quanjun Zhang, Bin Luo

Abstract

Reusing off-the-shelf code snippets from online repositories is a common practice, which significantly enhances the productivity of software developers. To find desired code snippets, developers resort to code search engines through natural language queries. Neural code search models are hence behind many such engines. These models are based on deep learning and gain substantial attention due to their impressive performance. However, the security aspect of these models is rarely studied. Particularly, an adversary can inject a backdoor in neural code search models, which return buggy or even vulnerable code with security/privacy issues. This may impact the downstream software (e.g., stock trading systems and autonomous driving) and cause financial loss and/or life-threatening incidents. In this paper, we demonstrate such attacks are feasible and can be quite stealthy. By simply modifying one variable/function name, the attacker can make buggy/vulnerable code rank in the top 11%. Our attack BADCODE features a special trigger generation and injection procedure, making the attack more effective and stealthy. The evaluation is conducted on two neural code search models and the results show our attack outperforms baselines by 60%. Our user study demonstrates that our attack is more stealthy than the baseline by two times based on the F1 score.

Anthology ID:: 2023.acl-long.540
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9692–9708
Language:
URL:: https://aclanthology.org/2023.acl-long.540
DOI:: 10.18653/v1/2023.acl-long.540
Bibkey:
Cite (ACL):: Weisong Sun, Yuchen Chen, Guanhong Tao, Chunrong Fang, Xiangyu Zhang, Quanjun Zhang, and Bin Luo. 2023. Backdooring Neural Code Search. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9692–9708, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Backdooring Neural Code Search (Sun et al., ACL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.acl-long.540.pdf
Video:: https://aclanthology.org/2023.acl-long.540.mp4

PDF Cite Search Video