Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models

Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models Jiashu Xu author Mingyu Ma author Fei Wang author Chaowei Xiao author Muhao Chen author 2024-06 text Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) Kevin Duh editor Helena Gomez editor Steven Bethard editor Association for Computational Linguistics Mexico City, Mexico conference publication xu-etal-2024-instructions 10.18653/v1/2024.naacl-long.171 https://aclanthology.org/2024.naacl-long.171/ 2024-06 3111 3126