Workshop on Adversarial testing and Red-Teaming for generative AI (2023) - ACL Anthology

Workshop on Adversarial testing and Red-Teaming for generative AI (2023)

Volumes

Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI 7 papers

bib (full) Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI

pdf bib
Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI
Alicia Parrish

pdf bib
Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
Aleksander Buszydlik | Karol Dobiczek | Michał Teodor Okoń | Konrad Skublicki | Philip Lippmann | Jie Yang

pdf bib
Student-Teacher Prompting for Red Teaming to Improve Guardrails
Rodrigo Revilla Llaca | Victoria Leskoschek | Vitor Costa Paiva | Cătălin Lupău | Philip Lippmann | Jie Yang

pdf bib
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
Manuel Brack | Patrick Schramowski | Kristian Kersting

pdf bib
Measuring Adversarial Datasets
Yuanchen Bai | Raoyi Huang | Vijay Viswanathan | Tzu-Sheng Kuo | Tongshuang Wu

pdf bib
Discovering Safety Issues in Text-to-Image Models: Insights from Adversarial Nibbler Challenge
Gauri Sharma

pdf bib
Uncovering Bias in AI-Generated Images
Kimberley Baxter