Removing RLHF Protections in GPT-4 via Fine-Tuning Qiusi Zhan author Richard Fang author Rohan Bindu author Akul Gupta author Tatsunori Hashimoto author Daniel Kang author 2024-06 text Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers) Kevin Duh editor Helena Gomez editor Steven Bethard editor Association for Computational Linguistics Mexico City, Mexico conference publication zhan-etal-2024-removing 10.18653/v1/2024.naacl-short.59 https://aclanthology.org/2024.naacl-short.59/ 2024-06 681 687