ROBBIE: Robust Bias Evaluation of Large Generative Language Models

David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, Eric Smith


Abstract
As generative large language models (LLMs) grow more performant and prevalent, we must develop comprehensive enough tools to measure and improve their fairness. Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes, meaning that testing LLMs on more datasets can potentially help us characterize their biases more fully, and better ensure equal and equitable treatment of marginalized demographic groups. In this work, our focus is two-fold: (1) Benchmarking: a comparison of 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs. Out of those 6 metrics, AdvPromptSet and HolisticBiasR are novel datasets proposed in the paper. The comparison of those benchmarks gives us insights about the bias and toxicity of the compared models. Therefore, we explore the frequency of demographic terms in common LLM pre-training corpora and how this may relate to model biases. (2) Mitigation: we conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements. ROBBIE aims to provide insights for practitioners while deploying a model, emphasizing the need to not only measure potential harms, but also understand how they arise by characterizing the data, mitigate harms once found, and balance any trade-offs. We open-source our analysis code in hopes of encouraging broader measurements of bias in future LLMs.
Anthology ID:
2023.emnlp-main.230
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3764–3814
Language:
URL:
https://aclanthology.org/2023.emnlp-main.230
DOI:
10.18653/v1/2023.emnlp-main.230
Bibkey:
Cite (ACL):
David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, and Eric Smith. 2023. ROBBIE: Robust Bias Evaluation of Large Generative Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3764–3814, Singapore. Association for Computational Linguistics.
Cite (Informal):
ROBBIE: Robust Bias Evaluation of Large Generative Language Models (Esiobu et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.230.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.230.mp4