Sustainable Modular Debiasing of Language Models

Anne Lauscher, Tobias Lueken, Goran Glavaš


Abstract
Unfair stereotypical biases (e.g., gender, racial, or religious biases) encoded in modern pretrained language models (PLMs) have negative ethical implications for widespread adoption of state-of-the-art language technology. To remedy for this, a wide range of debiasing techniques have recently been introduced to remove such stereotypical biases from PLMs. Existing debiasing methods, however, directly modify all of the PLMs parameters, which – besides being computationally expensive – comes with the inherent risk of (catastrophic) forgetting of useful language knowledge acquired in pretraining. In this work, we propose a more sustainable modular debiasing approach based on dedicated debiasing adapters, dubbed ADELE. Concretely, we (1) inject adapter modules into the original PLM layers and (2) update only the adapters (i.e., we keep the original PLM parameters frozen) via language modeling training on a counterfactually augmented corpus. We showcase ADELE, in gender debiasing of BERT: our extensive evaluation, encompassing three intrinsic and two extrinsic bias measures, renders ADELE, very effective in bias mitigation. We further show that – due to its modular nature – ADELE, coupled with task adapters, retains fairness even after large-scale downstream training. Finally, by means of multilingual BERT, we successfully transfer ADELE, to six target languages.
Anthology ID:
2021.findings-emnlp.411
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2021
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
Findings
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
4782–4797
Language:
URL:
https://aclanthology.org/2021.findings-emnlp.411
DOI:
10.18653/v1/2021.findings-emnlp.411
Bibkey:
Cite (ACL):
Anne Lauscher, Tobias Lueken, and Goran Glavaš. 2021. Sustainable Modular Debiasing of Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4782–4797, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Sustainable Modular Debiasing of Language Models (Lauscher et al., Findings 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.findings-emnlp.411.pdf
Software:
 2021.findings-emnlp.411.Software.zip
Video:
 https://aclanthology.org/2021.findings-emnlp.411.mp4
Data
MultiNLI