Mitigating the Inconsistency Between Word Saliency and Model Confidence with Pathological Contrastive Training

Pengwei Zhan; Yang Wu; Shaolei Zhou; Yunjian Zhang; Liming Wang

doi:10.18653/v1/2022.findings-acl.175

Mitigating the Inconsistency Between Word Saliency and Model Confidence with Pathological Contrastive Training

Pengwei Zhan, Yang Wu, Shaolei Zhou, Yunjian Zhang, Liming Wang

Abstract

Neural networks are widely used in various NLP tasks for their remarkable performance. However, the complexity makes them difficult to interpret, i.e., they are not guaranteed right for the right reason. Besides the complexity, we reveal that the model pathology - the inconsistency between word saliency and model confidence, further hurts the interpretability. We show that the pathological inconsistency is caused by the representation collapse issue, which means that the representation of the sentences with tokens in different saliency reduced is somehow collapsed, and thus the important words cannot be distinguished from unimportant words in terms of model confidence changing. In this paper, to mitigate the pathology and obtain more interpretable models, we propose Pathological Contrastive Training (PCT) framework, which adopts contrastive learning and saliency-based samples augmentation to calibrate the sentences representation. Combined with qualitative analysis, we also conduct extensive quantitative experiments and measure the interpretability with eight reasonable metrics. Experiments show that our method can mitigate the model pathology and generate more interpretable models while keeping the model performance. Ablation study also shows the effectiveness.

Anthology ID:: 2022.findings-acl.175
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2226–2244
Language:
URL:: https://aclanthology.org/2022.findings-acl.175/
DOI:: 10.18653/v1/2022.findings-acl.175
Bibkey:
Cite (ACL):: Pengwei Zhan, Yang Wu, Shaolei Zhou, Yunjian Zhang, and Liming Wang. 2022. Mitigating the Inconsistency Between Word Saliency and Model Confidence with Pathological Contrastive Training. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2226–2244, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Mitigating the Inconsistency Between Word Saliency and Model Confidence with Pathological Contrastive Training (Zhan et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-acl.175.pdf
Software:: 2022.findings-acl.175.software.zip
Data: AG News, IMDb Movie Reviews

PDF Cite Search Software Fix data