Optimising Equal Opportunity Fairness in Model Training

Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, Lea Frermann


Abstract
Real-world datasets often encode stereotypes and societal biases. Such biases can be implicitly captured by trained models, leading to biased predictions and exacerbating existing societal preconceptions. Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias. However, a disconnect between fairness criteria and training objectives makes it difficult to reason theoretically about the effectiveness of different techniques. In this work, we propose two novel training objectives which directly optimise for the widely-used criterion of equal opportunity, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.
Anthology ID:
2022.naacl-main.299
Volume:
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
July
Year:
2022
Address:
Seattle, United States
Editors:
Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4073–4084
Language:
URL:
https://aclanthology.org/2022.naacl-main.299
DOI:
10.18653/v1/2022.naacl-main.299
Bibkey:
Cite (ACL):
Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, and Lea Frermann. 2022. Optimising Equal Opportunity Fairness in Model Training. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4073–4084, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):
Optimising Equal Opportunity Fairness in Model Training (Shen et al., NAACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.naacl-main.299.pdf
Video:
 https://aclanthology.org/2022.naacl-main.299.mp4
Code
 ailiaili/difference_mean_fair_models