Dynamically Disentangling Social Bias from Task-Oriented Representations with Adversarial Attack

Liwen Wang, Yuanmeng Yan, Keqing He, Yanan Wu, Weiran Xu


Abstract
Representation learning is widely used in NLP for a vast range of tasks. However, representations derived from text corpora often reflect social biases. This phenomenon is pervasive and consistent across different neural models, causing serious concern. Previous methods mostly rely on a pre-specified, user-provided direction or suffer from unstable training. In this paper, we propose an adversarial disentangled debiasing model to dynamically decouple social bias attributes from the intermediate representations trained on the main task. We aim to denoise bias information while training on the downstream task, rather than completely remove social bias and pursue static unbiased representations. Experiments show the effectiveness of our method, both on the effect of debiasing and the main task performance.
Anthology ID:
2021.naacl-main.293
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3740–3750
Language:
URL:
https://aclanthology.org/2021.naacl-main.293
DOI:
10.18653/v1/2021.naacl-main.293
Bibkey:
Cite (ACL):
Liwen Wang, Yuanmeng Yan, Keqing He, Yanan Wu, and Weiran Xu. 2021. Dynamically Disentangling Social Bias from Task-Oriented Representations with Adversarial Attack. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3740–3750, Online. Association for Computational Linguistics.
Cite (Informal):
Dynamically Disentangling Social Bias from Task-Oriented Representations with Adversarial Attack (Wang et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.293.pdf
Video:
 https://aclanthology.org/2021.naacl-main.293.mp4
Code
 w-lw/debias_adv