RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models

Soumya Barikeri, Anne Lauscher, Ivan Vulić, Goran Glavaš


Abstract
Text representation models are prone to exhibit a range of societal biases, reflecting the non-controlled and biased nature of the underlying pretraining data, which consequently leads to severe ethical issues and even bias amplification. Recent work has predominantly focused on measuring and mitigating bias in pretrained language models. Surprisingly, the landscape of bias measurements and mitigation resources and methods for conversational language models is still very scarce: it is limited to only a few types of bias, artificially constructed resources, and completely ignores the impact that debiasing methods may have on the final perfor mance in dialog tasks, e.g., conversational response generation. In this work, we present REDDITBIAS, the first conversational data set grounded in the actual human conversations from Reddit, allowing for bias measurement and mitigation across four important bias dimensions: gender,race,religion, and queerness. Further, we develop an evaluation framework which simultaneously 1)measures bias on the developed REDDITBIAS resource, and 2)evaluates model capability in dialog tasks after model debiasing. We use the evaluation framework to benchmark the widely used conversational DialoGPT model along with the adaptations of four debiasing methods. Our results indicate that DialoGPT is biased with respect to religious groups and that some debiasing techniques can remove this bias while preserving downstream task performance.
Anthology ID:
2021.acl-long.151
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1941–1955
Language:
URL:
https://aclanthology.org/2021.acl-long.151
DOI:
10.18653/v1/2021.acl-long.151
Bibkey:
Cite (ACL):
Soumya Barikeri, Anne Lauscher, Ivan Vulić, and Goran Glavaš. 2021. RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1941–1955, Online. Association for Computational Linguistics.
Cite (Informal):
RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models (Barikeri et al., ACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.acl-long.151.pdf
Video:
 https://aclanthology.org/2021.acl-long.151.mp4
Code
 umanlp/RedditBias
Data
MultiWOZ