Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context

Zhihao Zhang; Liting Huang; Guanghao Wu; Preslav Nakov; Heng Ji; Usman Naseem

Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context

Zhihao Zhang, Liting Huang, Guanghao Wu, Preslav Nakov, Heng Ji, Usman Naseem

Abstract

Safety alignment in Large Language Models is critical for healthcare; however, reliance on binary refusal boundaries often results in over-refusal of benign queries or unsafe compliance with harmful ones. While existing benchmarks measure these extremes, they fail to evaluate Safe Completion: the model’s ability to maximise helpfulness on dual-use or borderline queries by providing safe, high-level guidance without crossing into actionable harm. We introduce Health-ORSC-Bench, the first large-scale benchmark designed to systematically measure Over-Refusal and Safe Completion quality in healthcare. Comprising 31,920 benign boundary prompts across seven health categories (e.g., self-harm, medical misinformation), our framework uses an automated pipeline with human validation to test models at varying levels of intent ambiguity. We evaluate 30 state-of-the-art LLMs, including GPT-5 and Claude-4, revealing a significant tension: safety-optimised models frequently refuse up to 80% of "Hard" benign prompts, while domain-specific models often sacrifice safety for utility. Our findings demonstrate that model family and size significantly influence calibration: larger frontier models (e.g., GPT-5, Llama-4) exhibit "safety-pessimism" and higher over-refusal than smaller or MoE-based counterparts (e.g., Qwen-3-Next), highlighting that current LLMs struggle to balance refusal and compliance. Health-ORSC-Bench provides a rigorous standard for calibrating the next generation of medical AI assistants toward nuanced, safe, and helpful completions. Our code and data is available at: https://github.com/ZhihaoZhang97/Health-ORSC-Bench. Warning: Some contents may include toxic or undesired contents.

Anthology ID:: 2026.findings-acl.1177
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23525–23547
Language:
URL:: https://aclanthology.org/2026.findings-acl.1177/
DOI:
Bibkey:
Cite (ACL):: Zhihao Zhang, Liting Huang, Guanghao Wu, Preslav Nakov, Heng Ji, and Usman Naseem. 2026. Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context. In Findings of the Association for Computational Linguistics: ACL 2026, pages 23525–23547, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context (Zhang et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.1177.pdf
Checklist:: 2026.findings-acl.1177.checklist.pdf

PDF Cite Search Checklist Fix data