Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models

Alberto Purpura; Sahil Wadhwa; Jesse Zymet; Akshay Gupta; Andy Luo; Melissa Kazemi Rad; Swapnil Shinde; Mohammad Shahed Sorower

doi:10.18653/v1/2025.trustnlp-main.23

Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models

Alberto Purpura, Sahil Wadhwa, Jesse Zymet, Akshay Gupta, Andy Luo, Melissa Kazemi Rad, Swapnil Shinde, Mohammad Shahed Sorower

Abstract

The rapid growth of Large Language Models (LLMs) presents significant privacy, security, and ethical concerns. While much research has proposed methods for defending LLM systems against misuse by malicious actors, researchers have recently complemented these efforts with an offensive approach that involves red teaming, i.e., proactively attacking LLMs with the purpose of identifying their vulnerabilities. This paper provides a concise and practical overview of the LLM red teaming literature, structured so as to describe a multi-component system end-to-end. To motivate red teaming we survey the initial safety needs of some high-profile LLMs, and then dive into the different components of a red teaming system as well as software packages for implementing them. We cover various attack methods, strategies for attack-success evaluation, metrics for assessing experiment outcomes, as well as a host of other considerations. Our survey will be useful for any reader who wants to rapidly obtain a grasp of the major red teaming concepts for their own use in practical applications.

Anthology ID:: 2025.trustnlp-main.23
Volume:: Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Trista Cao, Anubrata Das, Tharindu Kumarage, Yixin Wan, Satyapriya Krishna, Ninareh Mehrabi, Jwala Dhamala, Anil Ramakrishna, Aram Galystan, Anoop Kumar, Rahul Gupta, Kai-Wei Chang
Venues:: TrustNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 335–350
Language:
URL:: https://aclanthology.org/2025.trustnlp-main.23/
DOI:: 10.18653/v1/2025.trustnlp-main.23
Bibkey:
Cite (ACL):: Alberto Purpura, Sahil Wadhwa, Jesse Zymet, Akshay Gupta, Andy Luo, Melissa Kazemi Rad, Swapnil Shinde, and Mohammad Shahed Sorower. 2025. Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models. In Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025), pages 335–350, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models (Purpura et al., TrustNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.trustnlp-main.23.pdf

PDF Cite Search Fix data