Josiane Van Dorpe


pdf bib
Unveiling Identity Biases in Toxicity Detection : A Game-Focused Dataset and Reactivity Analysis Approach
Josiane Van Dorpe | Zachary Yang | Nicolas Grenon-Godbout | Grégoire Winterstein
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

Identity biases arise commonly from annotated datasets, can be propagated in language models and can cause further harm to marginal groups. Existing bias benchmarking datasets are mainly focused on gender or racial biases and are made to pinpoint which class the model is biased towards. They also are not designed for the gaming industry, a concern for models built for toxicity detection in videogames’ chat. We propose a dataset and a method to highlight oversensitive terms using reactivity analysis and the model’s performance. We test our dataset against ToxBuster, a language model developed by Ubisoft fine-tuned for toxicity detection on multiplayer videogame’s written chat, and Perspective API. We find that these toxicity models often automatically tag terms related to a community’s identity as toxic, which prevents members of already marginalized groups to make their presence known or have a mature / normal conversation. Through this process, we have generated an interesting list of terms that trigger the models to varying degrees, along with insights on establishing a baseline through human annotations.