The Explanation Game: Towards Prediction Explainability through Sparse Communication

Marcos Treviso; André F. T. Martins

doi:10.18653/v1/2020.blackboxnlp-1.10

The Explanation Game: Towards Prediction Explainability through Sparse Communication

Abstract

Explainability is a topic of growing importance in NLP. In this work, we provide a unified perspective of explainability as a communication problem between an explainer and a layperson about a classifier’s decision. We use this framework to compare several explainers, including gradient methods, erasure, and attention mechanisms, in terms of their communication success. In addition, we reinterpret these methods in the light of classical feature selection, and use this as inspiration for new embedded explainers, through the use of selective, sparse attention. Experiments in text classification and natural language inference, using different configurations of explainers and laypeople (including both machines and humans), reveal an advantage of attention-based explainers over gradient and erasure methods, and show that selective attention is a simpler alternative to stochastic rationalizers. Human experiments show strong results on text classification with post-hoc explainers trained to optimize communication success.

Anthology ID:: 2020.blackboxnlp-1.10
Volume:: Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Month:: November
Year:: 2020
Address:: Online
Editors:: Afra Alishahi, Yonatan Belinkov, Grzegorz Chrupała, Dieuwke Hupkes, Yuval Pinter, Hassan Sajjad
Venue:: BlackboxNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 107–118
Language:
URL:: https://aclanthology.org/2020.blackboxnlp-1.10/
DOI:: 10.18653/v1/2020.blackboxnlp-1.10
Bibkey:
Cite (ACL):: Marcos Treviso and André F. T. Martins. 2020. The Explanation Game: Towards Prediction Explainability through Sparse Communication. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 107–118, Online. Association for Computational Linguistics.
Cite (Informal):: The Explanation Game: Towards Prediction Explainability through Sparse Communication (Treviso & Martins, BlackboxNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.blackboxnlp-1.10.pdf
Optionalsupplementarymaterial:: 2020.blackboxnlp-1.10.OptionalSupplementaryMaterial.zip

PDF Cite Search Optionalsupplementarymaterial Fix data