DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet


Abstract
In recent years, several interpretability methods have been proposed to interpret the inner workings of Transformer models at different levels of precision and complexity.In this work, we propose a simple but effective technique to analyze encoder-decoder Transformers. Our method, which we name DecoderLens, allows the decoder to cross-attend representations of intermediate encoder activations instead of using the default final encoder output.The method thus maps uninterpretable intermediate vector representations to human-interpretable sequences of words or symbols, shedding new light on the information flow in this popular but understudied class of models.We apply DecoderLens to question answering, logical reasoning, speech recognition and machine translation models, finding that simpler subtasks are solved with high precision by low and intermediate encoder layers.
Anthology ID:
2024.findings-naacl.296
Volume:
Findings of the Association for Computational Linguistics: NAACL 2024
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4764–4780
Language:
URL:
https://aclanthology.org/2024.findings-naacl.296
DOI:
Bibkey:
Cite (ACL):
Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, and Jaap Jumelet. 2024. DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 4764–4780, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers (Langedijk et al., Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-naacl.296.pdf
Copyright:
 2024.findings-naacl.296.copyright.pdf