Sahil Jayaram
2021
Human Rationales as Attribution Priors for Explainable Stance Detection
Sahil Jayaram
|
Emily Allaway
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
As NLP systems become better at detecting opinions and beliefs from text, it is important to ensure not only that models are accurate but also that they arrive at their predictions in ways that align with human reasoning. In this work, we present a method for imparting human-like rationalization to a stance detection model using crowdsourced annotations on a small fraction of the training data. We show that in a data-scarce setting, our approach can improve the reasoning of a state-of-the-art classifier—particularly for inputs containing challenging phenomena such as sarcasm—at no cost in predictive performance. Furthermore, we demonstrate that attention weights surpass a leading attribution method in providing faithful explanations of our model’s predictions, thus serving as a computationally cheap and reliable source of attributions for our model.
Search