Giving Control Back to Models: Enabling Offensive Language Detection Models to Autonomously Identify and Mitigate Biases

Jiapeng Liu, Weijie Li, Xiaochao Fan, Wenjun Deng, Liang Yang, Yong Li, Yufeng Diao


Abstract
The rapid development of social media has led to an increase in online harassment and offensive speech, posing significant challenges for effective content moderation. Existing automated detection models often exhibit a bias towards predicting offensive speech based on specific vocabulary, which not only compromises model fairness but also potentially exacerbates biases against vulnerable and minority groups. Addressing these issues, this paper proposes a bias self-awareness and data self-iteration framework for mitigating model biases. This framework aims to “giving control back to models: enabling offensive language detection models to autonomously identify and mitigate biases” through bias self-awareness algorithms and self-iterative data augmentation method. Experimental results demonstrate that the proposed framework effectively reduces the false positive rate of models in both in-distribution and out-of-distribution tests, enhances model accuracy and fairness, and shows promising performance improvements in detecting offensive speech on larger-scale datasets.
Anthology ID:
2024.findings-emnlp.344
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5957–5966
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.344
DOI:
Bibkey:
Cite (ACL):
Jiapeng Liu, Weijie Li, Xiaochao Fan, Wenjun Deng, Liang Yang, Yong Li, and Yufeng Diao. 2024. Giving Control Back to Models: Enabling Offensive Language Detection Models to Autonomously Identify and Mitigate Biases. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 5957–5966, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Giving Control Back to Models: Enabling Offensive Language Detection Models to Autonomously Identify and Mitigate Biases (Liu et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.344.pdf