A Parameter-Efficient Multi-Objective Approach to Mitigate Stereotypical Bias in Language Models

Yifan Wang, Vera Demberg


Abstract
Pre-trained language models have shown impressive abilities of understanding and generating natural languages. However, they typically inherit undesired human-like bias and stereotypes from training data, which raises concerns about putting these models into use in real-world scenarios. Although prior research has proposed to reduce bias using different fairness objectives, they usually fail to capture different representations of bias and, therefore, struggle with fully debiasing models. In this work, we introduce a multi-objective probability alignment approach to overcome current challenges by incorporating multiple debiasing losses to locate and penalize bias in different forms. Compared to existing methods, our proposed method can more effectively and comprehensively reduce stereotypical bias, and maintains the language ability of pre-trained models at the same time. Besides, we adopt prefix-tuning to optimize fairness objectives, and results show that it can achieve better bias removal than full fine-tuning while requiring much fewer computational resources. Our code and data are available at https://github.com/Ewanwong/debias_NLG.
Anthology ID:
2024.gebnlp-1.1
Volume:
Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Seraphina Goldfarb-Tarrant, Debora Nozza
Venues:
GeBNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–19
Language:
URL:
https://aclanthology.org/2024.gebnlp-1.1
DOI:
10.18653/v1/2024.gebnlp-1.1
Bibkey:
Cite (ACL):
Yifan Wang and Vera Demberg. 2024. A Parameter-Efficient Multi-Objective Approach to Mitigate Stereotypical Bias in Language Models. In Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 1–19, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
A Parameter-Efficient Multi-Objective Approach to Mitigate Stereotypical Bias in Language Models (Wang & Demberg, GeBNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.gebnlp-1.1.pdf