FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts

Caroline Brun, Vassilina Nikoulina


Abstract
Large language models (LLMs) are increasingly popular but are also prone to generating bias, toxic or harmful language, which can have detrimental effects on individuals and communities. Although most efforts is put to assess and mitigate toxicity in generated content, it is primarily concentrated on English, while it’s essential to consider other languages as well. For addressing this issue, we create and release FrenchToxicityPrompts, a dataset of 50K naturally occurring French prompts and their continuations, annotated with toxicity scores from a widely used toxicity classifier. We evaluate 14 different models from four prevalent open-sourced families of LLMs against our dataset to assess their potential toxicity across various dimensions. We hope that our contribution will foster future research on toxicity detection and mitigation beyond English.
Anthology ID:
2024.trac-1.12
Volume:
Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Ritesh Kumar, Atul Kr. Ojha, Shervin Malmasi, Bharathi Raja Chakravarthi, Bornini Lahiri, Siddharth Singh, Shyam Ratan
Venues:
TRAC | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
105–114
Language:
URL:
https://aclanthology.org/2024.trac-1.12
DOI:
Bibkey:
Cite (ACL):
Caroline Brun and Vassilina Nikoulina. 2024. FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts. In Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024, pages 105–114, Torino, Italia. ELRA and ICCL.
Cite (Informal):
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts (Brun & Nikoulina, TRAC-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.trac-1.12.pdf