Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI
Alicia Parrish (Editor)
- Anthology ID:
- 2023.artofsafety-1
- Month:
- November
- Year:
- 2023
- Address:
- Bali, Indonesia
- Venues:
- artofsafety | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- URL:
- https://aclanthology.org/2023.artofsafety-1
- DOI:
Proceedings of the ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI
Alicia Parrish
Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics Tasks
Aleksander Buszydlik
|
Karol Dobiczek
|
Michał Teodor Okoń
|
Konrad Skublicki
|
Philip Lippmann
|
Jie Yang
Student-Teacher Prompting for Red Teaming to Improve Guardrails
Rodrigo Revilla Llaca
|
Victoria Leskoschek
|
Vitor Costa Paiva
|
Cătălin Lupău
|
Philip Lippmann
|
Jie Yang
Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge
Manuel Brack
|
Patrick Schramowski
|
Kristian Kersting
Measuring Adversarial Datasets
Yuanchen Bai
|
Raoyi Huang
|
Vijay Viswanathan
|
Tzu-Sheng Kuo
|
Tongshuang Wu
Discovering Safety Issues in Text-to-Image Models: Insights from Adversarial Nibbler Challenge
Gauri Sharma