Local Interpretation of Transformer Based on Linear Decomposition

Sen Yang, Shujian Huang, Wei Zou, Jianbing Zhang, Xinyu Dai, Jiajun Chen


Abstract
In recent years, deep neural networks (DNNs) have achieved state-of-the-art performance on a wide range of tasks. However, limitations in interpretability have hindered their applications in the real world. This work proposes to interpret neural networks by linear decomposition and finds that the ReLU-activated Transformer can be considered as a linear model on a single input. We further leverage the linearity of the model and propose a linear decomposition of the model output to generate local explanations. Our evaluation of sentiment classification and machine translation shows that our method achieves competitive performance in efficiency and fidelity of explanation. In addition, we demonstrate the potential of our approach in applications with examples of error analysis on multiple tasks.
Anthology ID:
2023.acl-long.572
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10270–10287
Language:
URL:
https://aclanthology.org/2023.acl-long.572
DOI:
10.18653/v1/2023.acl-long.572
Bibkey:
Cite (ACL):
Sen Yang, Shujian Huang, Wei Zou, Jianbing Zhang, Xinyu Dai, and Jiajun Chen. 2023. Local Interpretation of Transformer Based on Linear Decomposition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10270–10287, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Local Interpretation of Transformer Based on Linear Decomposition (Yang et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.572.pdf
Video:
 https://aclanthology.org/2023.acl-long.572.mp4