Contrastive Learning as a Polarizer: Mitigating Gender Bias by Fair and Biased sentences

Kyungmin Park, Sihyun Oh, Daehyun Kim, Juae Kim


Abstract
Recently, language models have accelerated the improvement in natural language processing. However, recent studies have highlighted a significant issue: social biases inherent in training data can lead models to learn and propagate these biases. In this study, we propose a contrastive learning method for bias mitigation, utilizing anchor points to push further negatives and pull closer positives within the representation space. This approach employs stereotypical data as negatives and stereotype-free data as positives, enhancing debiasing performance. Our model attained state-of-the-art performance in the ICAT score on the StereoSet, a benchmark for measuring bias in models. In addition, we observed that effective debiasing is achieved through an awareness of biases, as evidenced by improved hate speech detection scores. The implementation code and trained models are available at https://github.com/HUFS-NLP/CL_Polarizer.git.
Anthology ID:
2024.findings-naacl.293
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4725–4736
Language:
URL:
https://aclanthology.org/2024.findings-naacl.293
DOI:
10.18653/v1/2024.findings-naacl.293
Bibkey:
Cite (ACL):
Kyungmin Park, Sihyun Oh, Daehyun Kim, and Juae Kim. 2024. Contrastive Learning as a Polarizer: Mitigating Gender Bias by Fair and Biased sentences. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 4725–4736, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Contrastive Learning as a Polarizer: Mitigating Gender Bias by Fair and Biased sentences (Park et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.293.pdf