Interpreting Predictions of NLP Models

Eric Wallace, Matt Gardner, Sameer Singh


Abstract
Although neural NLP models are highly expressive and empirically successful, they also systematically fail in counterintuitive ways and are opaque in their decision-making process. This tutorial will provide a background on interpretation techniques, i.e., methods for explaining the predictions of NLP models. We will first situate example-specific interpretations in the context of other ways to understand models (e.g., probing, dataset analyses). Next, we will present a thorough study of example-specific interpretations, including saliency maps, input perturbations (e.g., LIME, input reduction), adversarial attacks, and influence functions. Alongside these descriptions, we will walk through source code that creates and visualizes interpretations for a diverse set of NLP tasks. Finally, we will discuss open problems in the field, e.g., evaluating, extending, and improving interpretation methods.
Anthology ID:
2020.emnlp-tutorials.3
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Month:
November
Year:
2020
Address:
Online
Editors:
Aline Villavicencio, Benjamin Van Durme
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20–23
Language:
URL:
https://aclanthology.org/2020.emnlp-tutorials.3
DOI:
10.18653/v1/2020.emnlp-tutorials.3
Bibkey:
Cite (ACL):
Eric Wallace, Matt Gardner, and Sameer Singh. 2020. Interpreting Predictions of NLP Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, pages 20–23, Online. Association for Computational Linguistics.
Cite (Informal):
Interpreting Predictions of NLP Models (Wallace et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-tutorials.3.pdf