SafetyQuizzer: Timely and Dynamic Evaluation on the Safety of LLMs

Zhichao Shi; Shaoling Jing; Yi Cheng; Hao Zhang; Yuanzhuo Wang; Jie Zhang; Huawei Shen (沈华伟); Xueqi Cheng (程学旗)

doi:10.18653/v1/2025.naacl-long.85

SafetyQuizzer: Timely and Dynamic Evaluation on the Safety of LLMs

Zhichao Shi, Shaoling Jing, Yi Cheng, Hao Zhang, Yuanzhuo Wang, Jie Zhang, Huawei Shen, Xueqi Cheng

Abstract

With the expansion of the application of Large Language Models (LLMs), concerns about their safety have grown among researchers. Numerous studies have demonstrated the potential risks of LLMs generating harmful content and have proposed various safety assessment benchmarks to evaluate these risks. However, the evaluation questions in current benchmarks, especially for Chinese, are too straightforward, making them easily rejected by target LLMs, and difficult to update with practical relevance due to their lack of correlation with real-world events. This hinders the effective application of these benchmarks in continuous evaluation tasks. To address these limitations, we propose SafetyQuizzer, a question-generation framework designed to evaluate the safety of LLMs more sustainably in the Chinese context. SafetyQuizzer leverages a finetuned LLM and jailbreaking attack templates to generate subtly offensive questions, which reduces the decline rate. Additionally, by utilizing retrieval-augmented generation, SafetyQuizzer incorporates the latest real-world events into evaluation questions, improving the adaptability of the benchmarks. Our experiments demonstrate that evaluation questions generated by SafetyQuizzer significantly reduce the decline rate compared to other benchmarks while maintaining a comparable attack success rate. Our code is available at https://github.com/zhichao-stone/SafetyQuizzer. Warning: this paper contains examples that may be offensive or upsetting.

Anthology ID:: 2025.naacl-long.85
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1733–1747
Language:
URL:: https://aclanthology.org/2025.naacl-long.85/
DOI:: 10.18653/v1/2025.naacl-long.85
Bibkey:
Cite (ACL):: Zhichao Shi, Shaoling Jing, Yi Cheng, Hao Zhang, Yuanzhuo Wang, Jie Zhang, Huawei Shen, and Xueqi Cheng. 2025. SafetyQuizzer: Timely and Dynamic Evaluation on the Safety of LLMs. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1733–1747, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: SafetyQuizzer: Timely and Dynamic Evaluation on the Safety of LLMs (Shi et al., NAACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.naacl-long.85.pdf

PDF Cite Search Fix data