Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations

Julia El Zini, Mariette Awad


Abstract
Contrastive explanation methods go beyond transparency and address the contrastive aspect of explanations. Such explanations are emerging as an attractive option to provide actionable change to scenarios adversely impacted by classifiers’ decisions. However, their extension to textual data is under-explored and there is little investigation on their vulnerabilities and limitations. This work motivates textual counterfactuals by highlighting the social limitations of non-contrastive explainability. We also lay the ground for a novel evaluation scheme inspired by the faithfulness of explanations. Accordingly, we extend the computation of three metrics, proximity, connectedness and stability, to textual data and we benchmark two successful contrastive methods, POLYJUICE and MiCE, on our suggested metrics. Experiments on sentiment analysis data show that the connectedness of counterfactuals to their original counterparts is not obvious in both models. More interestingly, the generated contrastive texts are more attainable with POLYJUICE which highlights the significance of latent representations in counterfactual search. Finally, we perform the first semantic adversarial attack on textual recourse methods. The results demonstrate the robustness of POLYJUICE and the role that latent input representations play in robustness and reliability.
Anthology ID:
2022.findings-emnlp.100
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1391–1402
Language:
URL:
https://aclanthology.org/2022.findings-emnlp.100
DOI:
10.18653/v1/2022.findings-emnlp.100
Bibkey:
Cite (ACL):
Julia El Zini and Mariette Awad. 2022. Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1391–1402, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations (El Zini & Awad, Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-emnlp.100.pdf