Flexible text generation for counterfactual fairness probing

Zee Fryer, Vera Axelrod, Ben Packer, Alex Beutel, Jilin Chen, Kellie Webster


Abstract
A common approach for testing fairness issues in text-based classifiers is through the use of counterfactuals: does the classifier output change if a sensitive attribute in the input is changed? Existing counterfactual generation methods typically rely on wordlists or templates, producing simple counterfactuals that fail to take into account grammar, context, or subtle sensitive attribute references, and could miss issues that the wordlist creators had not considered. In this paper, we introduce a task for generating counterfactuals that overcomes these shortcomings, and demonstrate how large language models (LLMs) can be leveraged to accomplish this task. We show that this LLM-based method can produce complex counterfactuals that existing methods cannot, comparing the performance of various counterfactual generation methods on the Civil Comments dataset and showing their value in evaluating a toxicity classifier.
Anthology ID:
2022.woah-1.20
Volume:
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)
Month:
July
Year:
2022
Address:
Seattle, Washington (Hybrid)
Editors:
Kanika Narang, Aida Mostafazadeh Davani, Lambert Mathias, Bertie Vidgen, Zeerak Talat
Venue:
WOAH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
209–229
Language:
URL:
https://aclanthology.org/2022.woah-1.20
DOI:
10.18653/v1/2022.woah-1.20
Bibkey:
Cite (ACL):
Zee Fryer, Vera Axelrod, Ben Packer, Alex Beutel, Jilin Chen, and Kellie Webster. 2022. Flexible text generation for counterfactual fairness probing. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 209–229, Seattle, Washington (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Flexible text generation for counterfactual fairness probing (Fryer et al., WOAH 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.woah-1.20.pdf
Data
Civil Comments