Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification

Ruixuan Tang; Hanjie Chen; Yangfeng Ji

doi:10.18653/v1/2022.blackboxnlp-1.30

Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification

Abstract

Some recent works observed the instability of post-hoc explanations when input side perturbations are applied to the model. This raises the interest and concern in the stability of post-hoc explanations. However, the remaining question is: is the instability caused by the neural network model or the post-hoc explanation method? This work explores the potential source that leads to unstable post-hoc explanations. To separate the influence from the model, we propose a simple output probability perturbation method. Compared to prior input side perturbation methods, the output probability perturbation method can circumvent the neural model’s potential effect on the explanations and allow the analysis on the explanation method. We evaluate the proposed method with three widely-used post-hoc explanation methods (LIME (Ribeiro et al., 2016), Kernel Shapley (Lundberg and Lee, 2017a), and Sample Shapley (Strumbelj and Kononenko, 2010)). The results demonstrate that the post-hoc methods are stable, barely producing discrepant explanations under output probability perturbations. The observation suggests that neural network models may be the primary source of fragile explanations.

Anthology ID:: 2022.blackboxnlp-1.30
Volume:: Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, Sarah Wiegreffe
Venue:: BlackboxNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 356–370
Language:
URL:: https://aclanthology.org/2022.blackboxnlp-1.30/
DOI:: 10.18653/v1/2022.blackboxnlp-1.30
Bibkey:
Cite (ACL):: Ruixuan Tang, Hanjie Chen, and Yangfeng Ji. 2022. Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 356–370, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification (Tang et al., BlackboxNLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.blackboxnlp-1.30.pdf

PDF Cite Search Fix data