N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator

Zheyu Lin; Jirui Yang; Yukui Qiu; Yubing Bao; Hengqi Guo; Yao Guan

doi:10.18653/v1/2026.acl-long.1334

N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator

Zheyu Lin, Jirui Yang, Yukui Qiu, Yubing Bao, Hengqi Guo, Yao Guan

Abstract

Evaluating the safety robustness of LLMs is critical for their deployment. However, mainstream Red Teaming methods rely on online generation and black-box output analysis. These approaches are not only costly but also suffer from feedback latency, making them unsuitable for agile diagnostics after training a new model.To address this, we propose N-GLARE (A Non-Generative, Latent Representation-Efficient LLM Safety Evaluator). N-GLARE operates entirely on the model’s latent representations, bypassing the need for full text generation. It characterizes hidden layer dynamics by analyzing the APT (Angular-Probabilistic Trajectory) of latent representations and introducing the JSS (Jensen-Shannon Separability) metric.Experiments on over 40 models and 20 red teaming strategies demonstrate that the JSS metric exhibits high consistency with Red Teaming safety rankings at less than 1% token and runtime cost.

Anthology ID:: 2026.acl-long.1334
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28902–28923
Language:
URL:: https://aclanthology.org/2026.acl-long.1334/
DOI:: 10.18653/v1/2026.acl-long.1334
Bibkey:
Cite (ACL):: Zheyu Lin, Jirui Yang, Yukui Qiu, Yubing Bao, Hengqi Guo, and Yao Guan. 2026. N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 28902–28923, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator (Lin et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-long.1334.pdf
Checklist:: 2026.acl-long.1334.checklist.pdf

PDF Cite Search Checklist Fix data