Enhancing textual counterfactual explanation intelligibility through Counterfactual Feature Importance

Milan Bhan, Jean-noel Vittaut, Nicolas Chesneau, Marie-jeanne Lesot


Abstract
Textual counterfactual examples explain a prediction by modifying the tokens of an initial instance in order to flip the outcome of a classifier. Even under sparsity constraint, counterfactual generation can lead to numerous changes from the initial text, making the explanation hard to understand. We propose Counterfactual Feature Importance, a method to make non-sparse counterfactual explanations more intelligible. Counterfactual Feature Importance assesses token change importance between an instance to explain and its counterfactual example. We develop two ways of computing Counterfactual Feature Importance, respectively based on classifier gradient computation and counterfactual generator loss evolution during counterfactual search. Then we design a global version of Counterfactual Feature Importance, providing rich information about semantic fields globally impacting classifier predictions. Counterfactual Feature Importance enables to focus on impacting parts of counterfactual explanations, making counterfactual explanations involving numerous changes more understandable.
Anthology ID:
2023.trustnlp-1.19
Volume:
Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anaelia Ovalle, Kai-Wei Chang, Ninareh Mehrabi, Yada Pruksachatkun, Aram Galystan, Jwala Dhamala, Apurv Verma, Trista Cao, Anoop Kumar, Rahul Gupta
Venue:
TrustNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
221–231
Language:
URL:
https://aclanthology.org/2023.trustnlp-1.19
DOI:
10.18653/v1/2023.trustnlp-1.19
Bibkey:
Cite (ACL):
Milan Bhan, Jean-noel Vittaut, Nicolas Chesneau, and Marie-jeanne Lesot. 2023. Enhancing textual counterfactual explanation intelligibility through Counterfactual Feature Importance. In Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pages 221–231, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Enhancing textual counterfactual explanation intelligibility through Counterfactual Feature Importance (Bhan et al., TrustNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.trustnlp-1.19.pdf
Supplementary material:
 2023.trustnlp-1.19.SupplementaryMaterial.zip
Supplementary material:
 2023.trustnlp-1.19.SupplementaryMaterial.zip