ROSE Doesn’t Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding

ROSE Doesn’t Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding Qihuang Zhong author Liang Ding author Juhua Liu author Bo Du author Dacheng Tao author 2024-08 text Findings of the Association for Computational Linguistics: ACL 2024 Lun-Wei Ku editor Andre Martins editor Vivek Srikumar editor Association for Computational Linguistics Bangkok, Thailand conference publication zhong-etal-2024-rose 10.18653/v1/2024.findings-acl.814 https://aclanthology.org/2024.findings-acl.814/ 2024-08 13721 13736