LS-Guard: Adaptive Safety Guardrails Tailored to Individual LLMs

Jinggui Liang, Lizi Liao


Abstract
Large Language Models (LLMs) excel at diverse tasks, but remain vulnerable to malicious inputs such as jailbreak attacks. Current one-size-fits-all safety guardrails built from static datasets ignore each model’s unique safety profile and often force trade-offs between safety and utility. To address this gap, we propose LS-Guard, a framework for learning model-specific guardrails tailored to each LLM’s vulnerabilities. LS-Guard operates in two stages: First, it dynamically profiles a given LLM by probing it with malicious prompts to elicit the model’s responses, which are then dynamically labeled to reveal model-specific failure modes. Second, it uses this data to train a safety classifier with a collaborative multi-LoRA architecture. An orthogonality-constrained multi-task loss enables a central expert to learn general safety features while each subject-specific expert encodes the distinctive vulnerability patterns of one LLM. During inference, LS-Guard activates the central expert together with its model-specific expert to perform content moderation, yielding reliable safety decisions. Extensive experiments on multiple real-world LLMs demonstrate that LS-Guard significantly outperforms strong baseline guardrails, achieving superior robustness, adaptability, and generalization.
Anthology ID:
2026.findings-acl.989
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19759–19772
Language:
URL:
https://aclanthology.org/2026.findings-acl.989/
DOI:
Bibkey:
Cite (ACL):
Jinggui Liang and Lizi Liao. 2026. LS-Guard: Adaptive Safety Guardrails Tailored to Individual LLMs. In Findings of the Association for Computational Linguistics: ACL 2026, pages 19759–19772, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
LS-Guard: Adaptive Safety Guardrails Tailored to Individual LLMs (Liang & Liao, Findings 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.findings-acl.989.pdf
Checklist:
 2026.findings-acl.989.checklist.pdf