SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers

Dheeraj Rajagopal, Vidhisha Balachandran, Eduard H Hovy, Yulia Tsvetkov


Abstract
We introduce SelfExplain, a novel self-explaining model that explains a text classifier’s predictions using phrase-based concepts. SelfExplain augments existing neural classifiers by adding (1) a globally interpretable layer that identifies the most influential concepts in the training set for a given sample and (2) a locally interpretable layer that quantifies the contribution of each local input concept by computing a relevance score relative to the predicted label. Experiments across five text-classification datasets show that SelfExplain facilitates interpretability without sacrificing performance. Most importantly, explanations from SelfExplain show sufficiency for model predictions and are perceived as adequate, trustworthy and understandable by human judges compared to existing widely-used baselines.
Anthology ID:
2021.emnlp-main.64
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
836–850
Language:
URL:
https://aclanthology.org/2021.emnlp-main.64
DOI:
10.18653/v1/2021.emnlp-main.64
Bibkey:
Cite (ACL):
Dheeraj Rajagopal, Vidhisha Balachandran, Eduard H Hovy, and Yulia Tsvetkov. 2021. SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 836–850, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers (Rajagopal et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.64.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.64.mp4
Code
 dheerajrajagopal/SelfExplain +  additional community code
Data
SST