A Keyword Based Approach to Understanding the Overpenalization of Marginalized Groups by English Marginal Abuse Models on Twitter

Kyra Yee, Alice Schoenauer Sebag, Olivia Redfield, Matthias Eck, Emily Sheng, Luca Belli


Abstract
Harmful content detection models tend to have higher false positive rates for content from marginalized groups. In the context of marginal abuse modeling on Twitter, such disproportionate penalization poses the risk of reduced visibility, where marginalized communities lose the opportunity to voice their opinion on the platform. Current approaches to algorithmic harm mitigation, and bias detection for NLP models are often very ad hoc and subject to human bias. We make two main contributions in this paper. First, we design a novel methodology, which provides a principled approach to detecting and measuring the severity of potential harms associated with a text-based model. Second, we apply our methodology to audit Twitter’s English marginal abuse model, which is used for removing amplification eligibility of marginally abusive content. Without utilizing demographic labels or dialect classifiers, we are still able to detect and measure the severity of issues related to the over-penalization of the speech of marginalized communities, such as the use of reclaimed speech, counterspeech, and identity related terms. In order to mitigate the associated harms, we experiment with adding additional true negative examples and find that doing so provides improvements to our fairness metrics without large degradations in model performance.
Anthology ID:
2023.trustnlp-1.10
Volume:
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anaelia Ovalle, Kai-Wei Chang, Ninareh Mehrabi, Yada Pruksachatkun, Aram Galystan, Jwala Dhamala, Apurv Verma, Trista Cao, Anoop Kumar, Rahul Gupta
Venue:
TrustNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
108–120
Language:
URL:
https://aclanthology.org/2023.trustnlp-1.10
DOI:
10.18653/v1/2023.trustnlp-1.10
Bibkey:
Cite (ACL):
Kyra Yee, Alice Schoenauer Sebag, Olivia Redfield, Matthias Eck, Emily Sheng, and Luca Belli. 2023. A Keyword Based Approach to Understanding the Overpenalization of Marginalized Groups by English Marginal Abuse Models on Twitter. In Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pages 108–120, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
A Keyword Based Approach to Understanding the Overpenalization of Marginalized Groups by English Marginal Abuse Models on Twitter (Yee et al., TrustNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.trustnlp-1.10.pdf