Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI

Alicia Parrish (Editor)


Anthology ID:
2023.artofsafety-1
Month:
November
Year:
2023
Address:
Bali, Indonesia
Venues:
artofsafety | WS
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2023.artofsafety-1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI
Alicia Parrish

pdf bib
Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
Aleksander Buszydlik | Karol Dobiczek | Michał Teodor Okoń | Konrad Skublicki | Philip Lippmann | Jie Yang

pdf bib
Student-Teacher Prompting for Red Teaming to Improve Guardrails
Rodrigo Revilla Llaca | Victoria Leskoschek | Vitor Costa Paiva | Cătălin Lupău | Philip Lippmann | Jie Yang

pdf bib
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
Manuel Brack | Patrick Schramowski | Kristian Kersting

pdf bib
Measuring Adversarial Datasets
Yuanchen Bai | Raoyi Huang | Vijay Viswanathan | Tzu-Sheng Kuo | Tongshuang Wu

pdf bib
Discovering Safety Issues in Text-to-Image Models: Insights from Adversarial Nibbler Challenge
Gauri Sharma

pdf bib
Uncovering Bias in AI-Generated Images
Kimberley Baxter