LS-Guard: Adaptive Safety Guardrails Tailored to Individual LLMs

Jinggui Liang; Lizi Liao

LS-Guard: Adaptive Safety Guardrails Tailored to Individual LLMs

Abstract

Large Language Models (LLMs) excel at diverse tasks, but remain vulnerable to malicious inputs such as jailbreak attacks. Current one-size-fits-all safety guardrails built from static datasets ignore each model’s unique safety profile and often force trade-offs between safety and utility. To address this gap, we propose LS-Guard, a framework for learning model-specific guardrails tailored to each LLM’s vulnerabilities. LS-Guard operates in two stages: First, it dynamically profiles a given LLM by probing it with malicious prompts to elicit the model’s responses, which are then dynamically labeled to reveal model-specific failure modes. Second, it uses this data to train a safety classifier with a collaborative multi-LoRA architecture. An orthogonality-constrained multi-task loss enables a central expert to learn general safety features while each subject-specific expert encodes the distinctive vulnerability patterns of one LLM. During inference, LS-Guard activates the central expert together with its model-specific expert to perform content moderation, yielding reliable safety decisions. Extensive experiments on multiple real-world LLMs demonstrate that LS-Guard significantly outperforms strong baseline guardrails, achieving superior robustness, adaptability, and generalization.

Anthology ID:: 2026.findings-acl.989
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19759–19772
Language:
URL:: https://aclanthology.org/2026.findings-acl.989/
DOI:
Bibkey:
Cite (ACL):: Jinggui Liang and Lizi Liao. 2026. LS-Guard: Adaptive Safety Guardrails Tailored to Individual LLMs. In Findings of the Association for Computational Linguistics: ACL 2026, pages 19759–19772, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: LS-Guard: Adaptive Safety Guardrails Tailored to Individual LLMs (Liang & Liao, Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-acl.989.pdf
Checklist:: 2026.findings-acl.989.checklist.pdf

PDF Cite Search Checklist Fix data