Jianfei Yang
2025
Identifying and Mitigating Social Bias Knowledge in Language Models
Ruizhe Chen
|
Yichen Li
|
Jianfei Yang
|
Yang Feng
|
Joey Tianyi Zhou
|
Jian Wu
|
Zuozhu Liu
Findings of the Association for Computational Linguistics: NAACL 2025
Generating fair and accurate predictions plays a pivotal role in deploying pre-trained language models (PLMs) in the real world. However, existing debiasing methods may inevitably generate incorrect or nonsensical predictions as they are designed and evaluated to achieve parity across different social groups but leave aside individual commonsense facts, resulting in modified knowledge that elicits unreasonable or undesired predictions. This paper introduces a novel debiasing framework that first identifies the encoding locations of biases within language models and then applies the Fairness-Stamp (FAST). FAST focuses on fine-grained, individual bias mitigation and integrates a lightweight network into PLMs, specifically targeting identified biases while preserving essential knowledge and maintaining factual integrity. We also present BiaScope, a new benchmark comprising datasets and metrics designed to evaluate the retention of commonsense knowledge and the generalization across paraphrased social biases. Our extensive experiments across multiple datasets demonstrate that FAST surpasses state-of-the-art baselines with superior debiasing performance while not compromising the overall model capability for knowledge retention and downstream predictions. This highlights the potential of fine-grained debiasing strategies to achieve fairness in PLMs. Code will be publicly available.
2021
Unsupervised Energy-based Adversarial Domain Adaptation for Cross-domain Text Classification
Han Zou
|
Jianfei Yang
|
Xiaojian Wu
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Search
Fix data
Co-authors
- Ruizhe Chen 1
- Yang Feng (冯洋) 1
- Yichen Li 1
- Zuozhu Liu 1
- Xiaojian Wu 1
- show all...