Making Harmful Behaviors Unlearnable for Large Language Models

Making Harmful Behaviors Unlearnable for Large Language Models Xin Zhou author Yi Lu author Ruotian Ma author Yujian Wei author Tao Gui author Qi Zhang author Xuanjing Huang author 2024-08 text Findings of the Association for Computational Linguistics: ACL 2024 Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication zhou-etal-2024-making 10.18653/v1/2024.findings-acl.611 https://aclanthology.org/2024.findings-acl.611/ 2024-08 10258 10273