Wenjun Deng
2024
Giving Control Back to Models: Enabling Offensive Language Detection Models to Autonomously Identify and Mitigate Biases
Jiapeng Liu
|
Weijie Li
|
Xiaochao Fan
|
Wenjun Deng
|
Liang Yang
|
Yong Li
|
Yufeng Diao
Findings of the Association for Computational Linguistics: EMNLP 2024
The rapid development of social media has led to an increase in online harassment and offensive speech, posing significant challenges for effective content moderation. Existing automated detection models often exhibit a bias towards predicting offensive speech based on specific vocabulary, which not only compromises model fairness but also potentially exacerbates biases against vulnerable and minority groups. Addressing these issues, this paper proposes a bias self-awareness and data self-iteration framework for mitigating model biases. This framework aims to “giving control back to models: enabling offensive language detection models to autonomously identify and mitigate biases” through bias self-awareness algorithms and self-iterative data augmentation method. Experimental results demonstrate that the proposed framework effectively reduces the false positive rate of models in both in-distribution and out-of-distribution tests, enhances model accuracy and fairness, and shows promising performance improvements in detecting offensive speech on larger-scale datasets.
Search
Fix data
Co-authors
- Yufeng Diao 1
- Xiaochao Fan (樊小超) 1
- Weijie Li 1
- Yong Li 1
- Jiapeng Liu 1
- show all...