D-CALM: A Dynamic Clustering-based Active Learning Approach for Mitigating Bias

Sabit Hassan; Malihe Alikhani

doi:10.18653/v1/2023.findings-acl.342

D-CALM: A Dynamic Clustering-based Active Learning Approach for Mitigating Bias

Abstract

Despite recent advancements, NLP models continue to be vulnerable to bias. This bias often originates from the uneven distribution of real-world data and can propagate through the annotation process. Escalated integration of these models in our lives calls for methods to mitigate bias without overbearing annotation costs. While active learning (AL) has shown promise in training models with a small amount of annotated data, AL’s reliance on the model’s behavior for selective sampling can lead to an accumulation of unwanted bias rather than bias mitigation. However, infusing clustering with AL can overcome the bias issue of both AL and traditional annotation methods while exploiting AL’s annotation efficiency. In this paper, we propose a novel adaptive clustering-based active learning algorithm, D-CALM, that dynamically adjusts clustering and annotation efforts in response to an estimated classifier error-rate. Experiments on eight datasets for a diverse set of text classification tasks, including emotion, hatespeech, dialog act, and book type detection, demonstrate that our proposed algorithm significantly outperforms baseline AL approaches with both pretrained transformers and traditional Support Vector Machines. D-CALM showcases robustness against different measures of information gain and, as evident from our analysis of label and error distribution, can significantly reduce unwanted model bias.

Anthology ID:: 2023.findings-acl.342
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5540–5553
Language:
URL:: https://aclanthology.org/2023.findings-acl.342/
DOI:: 10.18653/v1/2023.findings-acl.342
Bibkey:
Cite (ACL):: Sabit Hassan and Malihe Alikhani. 2023. D-CALM: A Dynamic Clustering-based Active Learning Approach for Mitigating Bias. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5540–5553, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: D-CALM: A Dynamic Clustering-based Active Learning Approach for Mitigating Bias (Hassan & Alikhani, Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-acl.342.pdf
Video:: https://aclanthology.org/2023.findings-acl.342.mp4

PDF Cite Search Video Fix data