SELF-GUARD: Empower the LLM to Safeguard Itself

SELF-GUARD: Empower the LLM to Safeguard Itself Zezhong Wang author Fangkai Yang author Lu Wang author Pu Zhao author Hongru Wang author Liang Chen author Qingwei Lin author Kam-Fai Wong author 2024-06 text Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) Kevin Duh editor Helena Gomez editor Steven Bethard editor Association for Computational Linguistics Mexico City, Mexico conference publication wang-etal-2024-self 10.18653/v1/2024.naacl-long.92 https://aclanthology.org/2024.naacl-long.92/ 2024-06 1648 1668