Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience

George Chrysostomou, Nikolaos Aletras


Abstract
Pretrained transformer-based models such as BERT have demonstrated state-of-the-art predictive performance when adapted into a range of natural language processing tasks. An open problem is how to improve the faithfulness of explanations (rationales) for the predictions of these models. In this paper, we hypothesize that salient information extracted a priori from the training data can complement the task-specific information learned by the model during fine-tuning on a downstream task. In this way, we aim to help BERT not to forget assigning importance to informative input tokens when making predictions by proposing SaLoss; an auxiliary loss function for guiding the multi-head attention mechanism during training to be close to salient information extracted a priori using TextRank. Experiments for explanation faithfulness across five datasets, show that models trained with SaLoss consistently provide more faithful explanations across four different feature attribution methods compared to vanilla BERT. Using the rationales extracted from vanilla BERT and SaLoss models to train inherently faithful classifiers, we further show that the latter result in higher predictive performance in downstream tasks.
Anthology ID:
2021.emnlp-main.645
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8189–8200
Language:
URL:
https://aclanthology.org/2021.emnlp-main.645
DOI:
10.18653/v1/2021.emnlp-main.645
Bibkey:
Cite (ACL):
George Chrysostomou and Nikolaos Aletras. 2021. Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8189–8200, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience (Chrysostomou & Aletras, EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.645.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.645.mp4
Code
 gchrysostomou/saloss
Data
SST