Investigating the Encoding of Words in BERT’s Neurons Using Feature Textualization

Tanja Baeumel, Soniya Vijayakumar, Josef van Genabith, Guenter Neumann, Simon Ostermann


Abstract
Pretrained language models (PLMs) form the basis of most state-of-the-art NLP technologies. Nevertheless, they are essentially black boxes: Humans do not have a clear understanding of what knowledge is encoded in different parts of the models, especially in individual neurons. A contrast is in computer vision, where feature visualization provides a decompositional interpretability technique for neurons of vision models. Activation maximization is used to synthesize inherently interpretable visual representations of the information encoded in individual neurons. Our work is inspired by this but presents a cautionary tale on the interpretability of single neurons, based on the first large-scale attempt to adapt activation maximization to NLP, and, more specifically, large PLMs. We propose feature textualization, a technique to produce dense representations of neurons in the PLM word embedding space. We apply feature textualization to the BERT model to investigate whether the knowledge encoded in individual neurons can be interpreted and symbolized. We find that the produced representations can provide insights about the knowledge encoded in individual neurons, but that individual neurons do not represent clear-cut symbolic units of language such as words. Additionally, we use feature textualization to investigate how many neurons are needed to encode words in BERT.
Anthology ID:
2023.blackboxnlp-1.20
Volume:
Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
Month:
December
Year:
2023
Address:
Singapore
Editors:
Yonatan Belinkov, Sophie Hao, Jaap Jumelet, Najoung Kim, Arya McCarthy, Hosein Mohebbi
Venues:
BlackboxNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
261–270
Language:
URL:
https://aclanthology.org/2023.blackboxnlp-1.20
DOI:
10.18653/v1/2023.blackboxnlp-1.20
Bibkey:
Cite (ACL):
Tanja Baeumel, Soniya Vijayakumar, Josef van Genabith, Guenter Neumann, and Simon Ostermann. 2023. Investigating the Encoding of Words in BERT’s Neurons Using Feature Textualization. In Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 261–270, Singapore. Association for Computational Linguistics.
Cite (Informal):
Investigating the Encoding of Words in BERT’s Neurons Using Feature Textualization (Baeumel et al., BlackboxNLP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.blackboxnlp-1.20.pdf