Towards Benchmarking the Utility of Explanations for Model Debugging

Maximilian Idahl, Lijun Lyu, Ujwal Gadiraju, Avishek Anand


Abstract
Post-hoc explanation methods are an important class of approaches that help understand the rationale underlying a trained model’s decision. But how useful are they for an end-user towards accomplishing a given task? In this vision paper, we argue the need for a benchmark to facilitate evaluations of the utility of post-hoc explanation methods. As a first step to this end, we enumerate desirable properties that such a benchmark should possess for the task of debugging text classifiers. Additionally, we highlight that such a benchmark facilitates not only assessing the effectiveness of explanations but also their efficiency.
Anthology ID:
2021.trustnlp-1.8
Volume:
Proceedings of the First Workshop on Trustworthy Natural Language Processing
Month:
June
Year:
2021
Address:
Online
Editors:
Yada Pruksachatkun, Anil Ramakrishna, Kai-Wei Chang, Satyapriya Krishna, Jwala Dhamala, Tanaya Guha, Xiang Ren
Venue:
TrustNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
68–73
Language:
URL:
https://aclanthology.org/2021.trustnlp-1.8
DOI:
10.18653/v1/2021.trustnlp-1.8
Bibkey:
Cite (ACL):
Maximilian Idahl, Lijun Lyu, Ujwal Gadiraju, and Avishek Anand. 2021. Towards Benchmarking the Utility of Explanations for Model Debugging. In Proceedings of the First Workshop on Trustworthy Natural Language Processing, pages 68–73, Online. Association for Computational Linguistics.
Cite (Informal):
Towards Benchmarking the Utility of Explanations for Model Debugging (Idahl et al., TrustNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.trustnlp-1.8.pdf
Data
SST