LEXPLAIN: Improving Model Explanations via Lexicon Supervision

Orevaoghene Ahia, Hila Gonen, Vidhisha Balachandran, Yulia Tsvetkov, Noah A. Smith


Abstract
Model explanations that shed light on the model’s predictions are becoming a desired additional output of NLP models, alongside their predictions. Challenges in creating these explanations include making them trustworthy and faithful to the model’s predictions. In this work, we propose a novel framework for guiding model explanations by supervising them explicitly. To this end, our method, LEXplain, uses task-related lexicons to directly supervise model explanations. This approach consistently improves the model’s explanations without sacrificing performance on the task, as we demonstrate on sentiment analysis and toxicity detection. Our analyses show that our method also demotes spurious correlations (i.e., with respect to African American English dialect) when performing the task, improving fairness.
Anthology ID:
2023.starsem-1.19
Volume:
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Alexis Palmer, Jose Camacho-collados
Venue:
*SEM
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
207–216
Language:
URL:
https://aclanthology.org/2023.starsem-1.19
DOI:
10.18653/v1/2023.starsem-1.19
Bibkey:
Cite (ACL):
Orevaoghene Ahia, Hila Gonen, Vidhisha Balachandran, Yulia Tsvetkov, and Noah A. Smith. 2023. LEXPLAIN: Improving Model Explanations via Lexicon Supervision. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023), pages 207–216, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
LEXPLAIN: Improving Model Explanations via Lexicon Supervision (Ahia et al., *SEM 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.starsem-1.19.pdf