Workshop on Adversarial testing and Red-Teaming for generative AI (2023)
Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI
Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI
Alicia Parrish
Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
Aleksander Buszydlik
|
Karol Dobiczek
|
Michał Teodor Okoń
|
Konrad Skublicki
|
Philip Lippmann
|
Jie Yang
Student-Teacher Prompting for Red Teaming to Improve Guardrails
Rodrigo Revilla Llaca
|
Victoria Leskoschek
|
Vitor Costa Paiva
|
Cătălin Lupău
|
Philip Lippmann
|
Jie Yang
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
Manuel Brack
|
Patrick Schramowski
|
Kristian Kersting
Measuring Adversarial Datasets
Yuanchen Bai
|
Raoyi Huang
|
Vijay Viswanathan
|
Tzu-Sheng Kuo
|
Tongshuang Wu
Discovering Safety Issues in Text-to-Image Models: Insights from Adversarial Nibbler Challenge
Gauri Sharma