Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT

Jaimeen Ahn, Hwaran Lee, Jinhwa Kim, Alice Oh


Abstract
Knowledge distillation is widely used to transfer the language understanding of a large model to a smaller model. However, after knowledge distillation, it was found that the smaller model is more biased by gender compared to the source large model. This paper studies what causes gender bias to increase after the knowledge distillation process. Moreover, we suggest applying a variant of the mixup on knowledge distillation, which is used to increase generalizability during the distillation process, not for augmentation. By doing so, we can significantly reduce the gender bias amplification after knowledge distillation. We also conduct an experiment on the GLUE benchmark to demonstrate that even if the mixup is applied, it does not have a significant adverse effect on the model’s performance.
Anthology ID:
2022.gebnlp-1.27
Volume:
Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:
July
Year:
2022
Address:
Seattle, Washington
Editors:
Christian Hardmeier, Christine Basta, Marta R. Costa-jussà, Gabriel Stanovsky, Hila Gonen
Venue:
GeBNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
266–272
Language:
URL:
https://aclanthology.org/2022.gebnlp-1.27
DOI:
10.18653/v1/2022.gebnlp-1.27
Bibkey:
Cite (ACL):
Jaimeen Ahn, Hwaran Lee, Jinhwa Kim, and Alice Oh. 2022. Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 266–272, Seattle, Washington. Association for Computational Linguistics.
Cite (Informal):
Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT (Ahn et al., GeBNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.gebnlp-1.27.pdf
Data
GLUE