Ecco: An Open Source Library for the Explainability of Transformer Language Models

Our understanding of why Transformer-based NLP models have been achieving their recent success lags behind our ability to continue scaling these models. To increase the transparency of Transformer-based language models, we present Ecco – an open-source library for the explainability of Transformer-based NLP models. Ecco provides a set of tools to capture, analyze, visualize, and interactively explore the inner mechanics of these models. This includes (1) gradient-based feature attribution for natural language generation (2) hidden states and their evolution between model layers (3) convenient access and examination tools for neuron activations in the under-explored Feed-Forward Neural Network sublayer of Transformer layers. (4) convenient examination of activation vectors via canonical correlation analysis (CCA), non-negative matrix factorization (NMF), and probing classifiers. We find that syntactic information can be retrieved from BERT’s FFNN representations in levels comparable to those in hidden state representations. More curiously, we find that the model builds up syntactic information in its hidden states even when intermediate FFNNs indicate diminished levels of syntactic information. Ecco is available at https://www.eccox.io/


Introduction
The Transformer architecture (Vaswani et al., 2017) has been powering many recent advances in NLP. A breakdown of this architecture is provided by Alammar (2018) and will help understand this paper's details. Pre-trained language models based on the architecture (Liu et al., 2018;Devlin et al., 2018;Radford et al., 2018Radford et al., , 2019Liu et al., 2019;Brown 1 The code is available at https://github.com/ jalammar/ecco 2 Video demo available at https://youtu.be/ bcEysXmR09c Figure 1: A set of tools to make the inner workings of Transformer language models more transparent. By introducing tools that analyze and visualize input saliency (for natural language generation), hidden states, and neuron activations, we aim to enable researchers to build more intuition about Transformer language models. et al., 2020) continue to push the envelope in various tasks in NLP and, more recently, in computer vision (Dosovitskiy et al., 2020). Our understanding of why these models work so well, however, still lags behind these developments.
Ecco provides tools and interactive explorable explanations 3 aiding the examination and intuition of: • Input saliency methods that score input tokens importance to generating a token are discussed in section 2.
• Hidden state evolution across the layers of the model and what it may tell us about each layer's role. This is discussed in section 3.
• Neuron activations and how individual and groups of model neurons spike in response to inputs and to produce outputs. This is discussed in section 4.
• Non-negative matrix factorization of neuron activations to uncover underlying patterns of neuron firings, revealing firing patterns of linguistic properties of input tokens. This is discussed in subsection 4.2.
Ecco creates rich, interactive interfaces directly inside Jupyter notebooks (Ragan-Kelley et al., 2014) running on pre-trained models from the Hugging Face transformers library (Wolf et al., 2020). Currently it supports GPT2 (Radford et al., 2018), BERT (Devlin et al., 2018), and RoBERTa (Liu et al., 2019). Support for more models and explainability methods is under development and open for community contribution.

Input Saliency
When a computer vision model classifies a picture as containing a husky, an input saliency map (Figure 2) can tell us whether the classification was made due to the visual properties of the animal itself or because of the snow in the background (Ribeiro et al., 2016). This is a method of attribution explaining the relationship between a model's output and inputs -helping us detect errors and biases to better understand the system's behavior. Multiple methods exist for assigning feature importance scores to the inputs of an NLP model Arrieta et al., 2020). Instead of assigning scores to pixels, in the NLP domain these methods assign scores to input tokens. The literature is most often concerned with this application for classification tasks rather than natural language generation. Ecco enables generating output tokens and then interactively exploring the saliency values for each output token.

Saliency View
In Figure 3, we see an experiment to probe the world knowledge of GPT2-XL. We ask the model to output William Shakespeare's date of birth. The model is correctly able to produce the date (1564, but broken into two tokens: 15 and 64, because the model's vocabulary does not include 1564 as a single token). By hovering on each token, Ecco imposes each input's saliency value as a background color. The darker the color, the more that input token is attributed responsibility for generating this output token.   15 and 64. Ecco shows the input saliency of each of these tokens using Gradient X Inputs. The darker the background color of the token is, the higher its saliency value.

Detailed Saliency View
Ecco also provides a detailed view to see the attribution values in more precision. Figure 4 demonstrates this interactive interface which displays the normalized attribution value as a percentage and bar next to each token. About Gradient-Based Saliency Ecco calculates feature importance based on Gradients X Inputs (Denil et al., 2015;Shrikumar et al., 2017) -a gradient-based saliency method shown by Atanasova et al. (2020) to perform well across various datasets for text classification in Transformer models.
Gradients X Inputs can be calculated using the following formula: Where X i is the embedding vector of the input token at timestep i, and ∇ X i f c (X 1:n ) is the backpropagated gradient of the score of the selected token. The resulting vector is then aggregated into a score via calculating the L2 norm as this was  empirically shown by Atanasova et al. (2020) to perform better than other methods.

Hidden States Examination
Another method to glean information about the inner workings of a language is by examining the hidden states produced by every Transformer block. Ecco provides multiple methods to examine the hidden states and to visualize how they evolve across the layers of the model.
Ecco bundles these methods (cca(), svcca(), pwcca(), and cka()) to allow convenient similarity comparison of language model representations. This includes hidden state representations, yet also extends to neuron activations (Ecco pays special attention to the neurons after the largest dense FFNN layer as can be seen in Section 4). Figure 6 shows a comparison of the hidden states and FFNN neuron activations as the model processes textual input. All three CCA methods take two activation vectors (be they hidden states or neuron activations) and assign a similarity score from zero (no correlation) to one (the two inputs are linear transformations of each other).

Ranking of Output Token Across Layers
Nostalgebraist (2020) presents compelling visual treatments showcasing the evolution of token rankings, logit scores, and softmax probabilities for the evolving hidden state through the various layers of the model. The author does this by projecting the hidden state into the output vocabulary using the language model head (which is typically used only for the output of the final layer).
Ecco enables creating such plots as can be seen in Figure 7. More examples showcasing this method can be found in (Alammar, 2021).

Comparing Token Rankings
Ecco also allows asking questions about which of two tokens the model chooses to output for a specific position. This includes questions of subjectverb agreement like those posed by Linzen et al. (2016). In that task, we want to analyze the model's capacity to encode syntactic number (whether the subject we're addressing is singular or plural) and syntactic subjecthood (which subject in the sentence we're addressing). Put simply, fill-in the blank. The only acceptable answers are 1) is 2) are: The keys to the cabinet Using Ecco, we can present this sentence to DistilGPT-2, and visualize the rankings of is and are using ecco.rankings watch(), which creates Figure 8. The first column shows the rankings of the token is as the completion of the sentence, and the second column shows those for the token are for that same position. The model ultimately ranks are as the more probable answer, but the figure raises the question of why five layers fail to rank are higher than is, and only the final layer sets the record straight.

Neuron Activations
The Feed-Forward Neural Network (FFNN) sublayer is one of the two major components inside a Transformer block (in addition to self-attention). It often makes up two-thirds of a Transformer block's parameters, thus providing a significant portion of the model's representational capacity. Previous work (Karpathy et al., 2015;Strobelt et al., 2017;Poerner et al., 2018;Radford et al., 2017;Olah et al., 2017Olah et al., , 2018Bau et al., 2018;Dalvi et al., 2019;Rethmeier et al., 2020) has examined neuron firings inside deep neural networks in both the NLP and computer vision domains. Ecco makes it easier to examine neuron activations by collecting them and providing tools to analyze them and reduce their dimensionality to extract underlying patterns.

Probing classifiers
Probing classifiers (Veldhoen et al., 2016;Adi et al., 2016;Conneau et al., 2018) are the most commonly used method for associating NLP model components with linguistic properties . Ecco currently supports linear probes with control tasks (Hewitt and Liang, 2019). Section 5 is a case study on using this method to probe FFNN representations for part-of-speech information.

Uncovering underlying patterns with NMF
By first capturing the activations of the neurons in FFNN layers of the model and then decomposing them into a more manageable number of factors through NMF, we can shed light on how various neuron groups respond to input tokens. Figure 9 shows intuitively interpretable firing patterns extracted from raw firings through NMF. This example, showcasing ten factors applied to the activations of layer #0 in response to a text passage, helps us identify neurons that respond to syntactic and semantic properties of the input text. The factor highlighted in this screenshot, factor 5, seems to correlate with pronouns.
This interface can compress a lot of data that showcase the excitement levels of factors (and, by extension, groups of neurons). The sparklines (Tufte, 2006) on the left give a snapshot of the excitement level of each factor across the entire sequence. Interacting with the sparklines (by hovering with a mouse or tapping) displays the activation of the factor on the tokens in the sequence on the right. Figure 10 explains the intuition behind dimensionality reduction using NMF. This method can reveal underlying behavior common to groups of neurons.

About Matrix Factorization of Neuron Activity
It can be used to analyze the entire network, a single layer, or groups of layers. Figure 9: Individual factor view: Activation pattern in response to pronouns Ten Factors extracted from the activations the neurons in Layer 0 in response to a passage from Notes from Underground. Hovering on the line graphs isolates the tokens of a single factor and imposes the magnitude of the factor's activation on the tokens as a background color. The darker the color the higher the activation magnitude. In addition to the pronouns factor highlighted in the figure, we can see factors that focus on specific regions of the text ( beginning , middle , and end ). This indicates neurons that are sensitive to positional encodings. View this interface online at (Alammar, 2020).

Case study: Probing FFNN neuron activations for PoS information
In this section, we use Ecco to examine the representations of BERT's Feed-Forward Neural Network using probing classifiers. Our work is most similar to Durrani et al. (2020). There has been plenty of work on probing BERT focused on the hidden states, but none to our knowledge that trained probes to extract token information from the FFNN representation.

Method
We first forward-pass the entire dataset through BERT. We capture all the hidden states of all the model's layers as well as the neuron activations of the FFNN sublayers (namely, the widest layer composed of 3072 neurons after the GELU activation). We then train external linear classifiers to predict the PoS of the tokens in the dataset and then report the accuracy on the test set. Because probes have been criticized as memorizing the inputs, we report selectivity scores (Hewitt and Liang, 2019) next to each accuracy score. Selectivity is metric that is calculated by generating a control task where each token is assigned a random part-of-speech tag. A separate probe is then trained on this control set. The difference in accuracy between the actual dataset and the control dataset is the selec-tivity score. The higher selectivity is, the more we can say that the probe really extracted part-ofspeech data from the representation of the model as opposed to simply memorizing the training set.

Experimental Setup
We use the Universal Dependencies version 2 partof-speech dataset in English. We extract 10,000 tokens and split them into a 67% and 33% train/test sets. We train linear probes for 50 epochs using the Adam optimizer. We run experiments with learning rates (0.1, 0.001, 1e-5) and report those of the best achieving learning rate (0.001). We run five trials and report their average results. For every trial, we train a probe for each permutation of 1) model layer 2) hidden state vs. FFNN activations 3) actual labels vs. random controls to calculate selectivity scores.

Results
We report accuracy and selectivity scores in Ta subsets of PoS information which the model is able to collect and assemble as it builds up its internal representations across layers.

System Design
Ecco is implemented as a python library that provides a wrapper around a pre-trained language model. The wrapper collects the required data from the language model (e.g., neuron activations, hidden states) and makes the needed calculations (e.g., input saliency, NMF dimensionality reduction). The interactive visualizations are built using web technologies manipulated through D3.js (Bostock et al., 2012). Ecco is built on top of open source libraries including Scikit-Learn (Pedregosa et al., 2011), Matplotlib (Hunter, 2007, NumPy (Walt et al., 2011), PyTorch (Paszke et al., 2019) and Transformers (Wolf et al., 2020). Canonical Correlation Analysis is calculated using the code opensourced 4 by the authors (Raghu et al., 2017;Morcos et al., 2018;Kornblith et al., 2019).

Limitations
Ecco's input saliency feature is currently only supported for GPT2-based models, while neuron ac-4 https://github.com/google/svcca tivation collection and dimensionality reduction are supported for GPT2 in addition to BERT and RoBERTa.
We echo the sentiment of Leavitt and Morcos (2020) that visualization has a role in building intuitions, but that researchers are encouraged to use that as a starting point towards building testable and falsifiable hypotheses of model interpretability.

Conclusion
As language models proliferate, more tools are needed to aid debugging models, explain their behavior, and build intuitions about their innermechanics. Ecco is one such tool combining ease of use, visual interactive explorables, and multiple model explainability methods.
Ecco is open-source software 5 and contributions are welcome.