BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers
Enja Kokalj | Blaž Škrlj | Nada Lavrač | Senja Pollak | Marko Robnik-Šikonja
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation
Transformer-based neural networks offer very good classification performance across a wide range of domains, but do not provide explanations of their predictions. While several explanation methods, including SHAP, address the problem of interpreting deep learning models, they are not adapted to operate on state-of-the-art transformer-based neural networks such as BERT. Another shortcoming of these methods is that their visualization of explanations in the form of lists of most relevant words does not take into account the sequential and structurally dependent nature of text. This paper proposes the TransSHAP method that adapts SHAP to transformer models including BERT-based text classifiers. It advances SHAP visualizations by showing explanations in a sequential manner, assessed by human evaluators as competitive to state-of-the-art solutions.