Model Interpretability and Rationale Extraction by Input Mask Optimization

Marc Brinner, Sina Zarrieß


Abstract
Concurrent with the rapid progress in neural network-based models in NLP, the need for creating explanations for the predictions of these black-box models has risen steadily. Yet, especially for complex inputs like texts or images, existing interpretability methods still struggle with deriving easily interpretable explanations that also accurately represent the basis for the model’s decision. To this end, we propose a new, model-agnostic method to generate extractive explanations for predictions made by neural networks, that is based on masking parts of the input which the model does not consider to be indicative of the respective class. The masking is done using gradient-based optimization combined with a new regularization scheme that enforces sufficiency, comprehensiveness, and compactness of the generated explanation. Our method achieves state-of-the-art results in a challenging paragraph-level rationale extraction task, showing that this task can be performed without training a specialized model. We further apply our method to image inputs and obtain high-quality explanations for image classifications, which indicates that the objectives for optimizing explanation masks in text generalize to inputs of other modalities.
Anthology ID:
2023.findings-acl.867
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13722–13744
Language:
URL:
https://aclanthology.org/2023.findings-acl.867
DOI:
10.18653/v1/2023.findings-acl.867
Bibkey:
Cite (ACL):
Marc Brinner and Sina Zarrieß. 2023. Model Interpretability and Rationale Extraction by Input Mask Optimization. In Findings of the Association for Computational Linguistics: ACL 2023, pages 13722–13744, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Model Interpretability and Rationale Extraction by Input Mask Optimization (Brinner & Zarrieß, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.867.pdf
Video:
 https://aclanthology.org/2023.findings-acl.867.mp4