Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings

Aaron Zheng; Mansi Rana; Andreas Stolcke

Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings

Aaron Zheng, Mansi Rana, Andreas Stolcke

Abstract

With the recent proliferation of large language models (LLMs), enterprises have been able to rapidly develop proof-of-concepts and prototypes. As a result, there is a growing need to implement robust guardrails that monitor, quantize and control an LLM’s behavior, ensuring that the use is reliable, safe, accurate and also aligned with the users’ expectations. Previous approaches for filtering out inappropriate user prompts or system outputs, such as LlamaGuard and OpenAI’s MOD API, have achieved significant success by fine-tuning existing LLMs. However, using fine-tuned LLMs as guardrails introduces increased latency and higher maintenance costs, which may not be practical or scalable for cost-efficient deployments. We take a different approach, focusing on fine-tuning a lightweight architecture: Sentence-BERT. This method reduces the model size from LlamaGuard’s 7 billion parameters to approximately 67 million, while maintaining comparable performance on the AEGIS safety benchmark.

Anthology ID:: 2025.coling-industry.58
Volume:: Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 689–696
Language:
URL:: https://aclanthology.org/2025.coling-industry.58/
DOI:
Bibkey:
Cite (ACL):: Aaron Zheng, Mansi Rana, and Andreas Stolcke. 2025. Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 689–696, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings (Zheng et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-industry.58.pdf

PDF Cite Search Fix data