Kanruethai Masuk
2022
Mitigating Spurious Correlation in Natural Language Understanding with Counterfactual Inference
Can Udomcharoenchaikit
|
Wuttikorn Ponwitayarat
|
Patomporn Payoungkhamdee
|
Kanruethai Masuk
|
Weerayut Buaphet
|
Ekapol Chuangsuwanich
|
Sarana Nutanong
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Despite their promising results on standard benchmarks, NLU models are still prone to make predictions based on shortcuts caused by unintended bias in the dataset. For example, an NLI model may use lexical overlap as a shortcut to make entailment predictions due to repetitive data generation patterns from annotators, also called annotation artifacts. In this paper, we propose a causal analysis framework to help debias NLU models. We show that (1) by defining causal relationships, we can introspect how much annotation artifacts affect the outcomes. (2) We can utilize counterfactual inference to mitigate bias with this knowledge. We found that viewing a model as a treatment can mitigate bias more effectively than viewing annotation artifacts as treatment. (3) In addition to bias mitigation, we can interpret how much each debiasing strategy is affected by annotation artifacts. Our experimental results show that using counterfactual inference can improve out-of-distribution performance in all settings while maintaining high in-distribution performance.