Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks

Lukas Hauzenberger, Shahed Masoudian, Deepak Kumar, Markus Schedl, Navid Rekabsaz


Abstract
Societal biases are reflected in large pre-trained language models and their fine-tuned versions on downstream tasks. Common in-processing bias mitigation approaches, such as adversarial training and mutual information removal, introduce additional optimization criteria, and update the model to reach a new debiased state. However, in practice, end-users and practitioners might prefer to switch back to the original model, or apply debiasing only on a specific subset of protected attributes. To enable this, we propose a novel modular bias mitigation approach, consisting of stand-alone highly sparse debiasing subnetworks, where each debiasing module can be integrated into the core model on-demand at inference time. Our approach draws from the concept of diff pruning, and proposes a novel training regime adaptable to various representation disentanglement optimizations. We conduct experiments on three classification tasks with gender, race, and age as protected attributes. The results show that our modular approach, while maintaining task performance, improves (or at least remains on-par with) the effectiveness of bias mitigation in comparison with baseline finetuning. Particularly on a two-attribute dataset, our approach with separately learned debiasing subnetworks shows effective utilization of either or both the subnetworks for selective bias mitigation.
Anthology ID:
2023.findings-acl.386
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6192–6214
Language:
URL:
https://aclanthology.org/2023.findings-acl.386
DOI:
10.18653/v1/2023.findings-acl.386
Bibkey:
Cite (ACL):
Lukas Hauzenberger, Shahed Masoudian, Deepak Kumar, Markus Schedl, and Navid Rekabsaz. 2023. Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6192–6214, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks (Hauzenberger et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.386.pdf