ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors Zhexin Zhang author Yida Lu author Jingyuan Ma author Di Zhang author Rui Li author Pei Ke author Hao Sun author Lei Sha author Zhifang Sui author Hongning Wang author Minlie Huang author 2024-11 text Findings of the Association for Computational Linguistics: EMNLP 2024 Yaser Al-Onaizan editor Mohit Bansal editor Yun-Nung Chen editor Association for Computational Linguistics Miami, Florida, USA conference publication zhang-etal-2024-shieldlm 10.18653/v1/2024.findings-emnlp.610 https://aclanthology.org/2024.findings-emnlp.610/ 2024-11 10420 10438