Mitigate Extrinsic Social Bias in Pre-trained Language Models via Continuous Prompts Adjustment

Yiwei Dai, Hengrui Gu, Ying Wang, Xin Wang


Abstract
Although pre-trained language models (PLMs) have been widely used in natural language understandings (NLU), they are still exposed to fairness issues. Most existing extrinsic debiasing methods rely on manually curated word lists for each sensitive groups to modify training data or to add regular constraints. However, these word lists are often limited by length and scope, resulting in the degradation performance of extrinsic bias mitigation. To address the aforementioned issues, we propose a **C**ontinuous **P**rompts **A**djustment **D**ebiasing method (CPAD), which generates continuous token lists from the entire vocabulary space and uses them to bridge the gap between outputs and targets in fairness learning process. Specifically, CPAD encapsulates fine-tuning objective and debiasing objectives into several independent prompts. To avoid the limitation of manual word lists, in fairness learning phase, we extract outputs from the entire vocabulary space via fine-tuned PLM. Then, we aggregate the outputs from the same sensitive group as continuous token lists to map the outputs into protected attribute labels. Finally, after we learn the debiasing prompts in the perspective of adversarial learning, we improve fairness by adjusting continuous prompts at model inference time. Through extensive experiments on three NLU tasks, we evaluate the debiasing performance from the perspectives of group fairness and fairness through unawareness. The experimental results show that CPAD outperforms all baselines in term of single and two-attributes debiasing performance.
Anthology ID:
2024.emnlp-main.620
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11068–11083
Language:
URL:
https://aclanthology.org/2024.emnlp-main.620
DOI:
Bibkey:
Cite (ACL):
Yiwei Dai, Hengrui Gu, Ying Wang, and Xin Wang. 2024. Mitigate Extrinsic Social Bias in Pre-trained Language Models via Continuous Prompts Adjustment. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 11068–11083, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Mitigate Extrinsic Social Bias in Pre-trained Language Models via Continuous Prompts Adjustment (Dai et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.620.pdf