Do-Not-Answer: Evaluating Safeguards in LLMs

Do-Not-Answer: Evaluating Safeguards in LLMs Yuxia Wang author Haonan Li author Xudong Han author Preslav Nakov author Timothy Baldwin author 2024-03 text Findings of the Association for Computational Linguistics: EACL 2024 Yvette Graham editor Matthew Purver editor Association for Computational Linguistics St. Julian’s, Malta conference publication wang-etal-2024-answer 10.18653/v1/2024.findings-eacl.61 https://aclanthology.org/2024.findings-eacl.61/ 2024-03 896 911