Identifying and Mitigating Annotation Bias in Natural Language Understanding using Causal Mediation Analysis

Sitiporn Sae Lim, Can Udomcharoenchaikit, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Sarana Nutanong


Abstract
NLU models have achieved promising results on standard benchmarks. Despite state-of-the-art accuracy, analysis reveals that many models make predictions using annotation bias rather than the properties we intend the model to learn. Consequently, these models perform poorly on out-of-distribution datasets. Recent advances in bias mitigation show that annotation bias can be alleviated through fine-tuning debiasing objectives. In this paper, we apply causal mediation analysis to gauge how much each model component mediates annotation biases. Using the knowledge from the causal analysis, we improve the model’s robustness against annotation bias through two bias mitigation methods: causal-grounded masking and gradient unlearning. Causal analysis reveals that biases concentrated in specific components, even after employing other training-time debiasing techniques. Manipulating these components by masking out neurons’ activations or updating specific weight blocks both demonstrably improve robustness against annotation artifacts.
Anthology ID:
2024.findings-acl.686
Volume:
Findings of the Association for Computational Linguistics ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand and virtual meeting
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11548–11563
Language:
URL:
https://aclanthology.org/2024.findings-acl.686
DOI:
Bibkey:
Cite (ACL):
Sitiporn Sae Lim, Can Udomcharoenchaikit, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, and Sarana Nutanong. 2024. Identifying and Mitigating Annotation Bias in Natural Language Understanding using Causal Mediation Analysis. In Findings of the Association for Computational Linguistics ACL 2024, pages 11548–11563, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Identifying and Mitigating Annotation Bias in Natural Language Understanding using Causal Mediation Analysis (Sae Lim et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-acl.686.pdf