How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs Yi Zeng author Hongpeng Lin author Jingwen Zhang author Diyi Yang author Ruoxi Jia author Weiyan Shi author 2024-08 text Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication zeng-etal-2024-johnny 10.18653/v1/2024.acl-long.773 https://aclanthology.org/2024.acl-long.773/ 2024-08 14322 14350