Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools

In the language domain, as in other domains, neural explainability takes an ever more important role, with feature attribution methods on the forefront. Many such methods require considerable computational resources and expert knowledge about implementation details and parameter choices. To facilitate research, we present Thermostat which consists of a large collection of model explanations and accompanying analysis tools. Thermostat allows easy access to over 200k explanations for the decisions of prominent state-of-the-art models spanning across different NLP tasks, generated with multiple explainers. The dataset took over 10k GPU hours (> one year) to compile; compute time that the community now saves. The accompanying software tools allow to analyse explanations instance-wise but also accumulatively on corpus level. Users can investigate and compare models, datasets and explainers without the need to orchestrate implementation details. Thermostat is fully open source, democratizes explainability research in the language domain, circumvents redundant computations and increases comparability and replicability.


Introduction
Deep neural networks are state-of-the-art in natural language processing (NLP) but due to their complexity they are commonly perceived as opaque (Karpathy et al., 2015;Li et al., 2017). For this reason, explainability has seen heightened attention in recent years (Belinkov and Glass, 2019;Wallace et al., 2020;Danilevsky et al., 2020).
The feature attribution maps produced and used in the above cited works arguably are the most crucial component of the studies. Unfortunately, none of the above cited papers explicitly links to the generated attribution maps. Easy access to a wide variety of such feature attribution maps, across models, datasets and explainers, however, holds a lot of potential. A central hub 1. would increase the comparability and replicability of explainability research, 2. would mitigate the computational burden, 3. would mitigate the implementational burden since in-depth expert knowledge of the explainers and models is required. Put differently, a central data hub containing a wide variety of feature attribution maps and offering easy access to them would (1) democratize explainability research to a certain degree, and (2) contribute to green NLP (Strubell et al., 2019) and green XAI (Schwarzenberg et al., 2021) by circumventing redundant computations.
For this reason, we compiled THERMOSTAT, 1 an easily accessible data hub that contains a large 88 1 import thermostat 2 data = thermostat.load("imdb-bert-lig") 3 example = data[0] 4 example.render() Figure 1: Code examples. Top: Loading a dataset and extracting a single instance. Bottom: Visualizing the instance as a heatmap on token level. quantity of feature attribution maps, from numerous explainers, for multiple models, trained on different tasks. Alongside the dataset, we publish a compatible library with analysis and convenience functions. In this paper, we introduce the data hub, showcase the ease of access and discuss use cases.

THERMOSTAT
THERMOSTAT is intended to be a continuous, collaborative project. As new models, datasets and explainers are published, we invite the community to extend THERMOSTAT. In what follows, we describe the current state which is published under https://github.com/ DFKI-NLP/thermostat.

Demonstration
First, we demonstrate the ease of access. Downloading a dataset requires just two lines of code, as illustrated in the snippet in Fig. 1, in which we download the attribution maps as returned by the (Layer) Integrated Gradients explainer (Sundararajan et al., 2017) for BERT classifications (Devlin et al., 2019) on the IMDb test set (Maas et al., 2011). In addition to the list of feature attributions, the input IDs, the true label of every instance given by the underlying dataset, and the logits of the model's predictions are shipped. Switching the explainer, model or dataset only requires to change the configuration identifier string ("imdb-bert-lig" in Fig. 1). All configuration identifiers in THERMO-STAT consist of three coordinates: dataset, model, and explainer. A visualization tool which returns heatmaps like the one shown in Fig. 1 is also contained in the accompanying library.
The object that holds the data after download inherits from the Dataset class of the datasets library (Lhoest et al., 2021). This is convenient, because data is cached and versioned automatically in the background and processed efficiently in parallel. Furthermore, datasets provides many convenience functions which can be applied directly, such as filtering and sorting.
Let us see how this helps us to efficiently compare models and explainers in THERMOSTAT. Let us first compare models. In what follows we consider BERT (Devlin et al., 2019) and ELECTRA (Clark et al., 2020), both trained on the MultiNLI (Williams et al., 2018) dataset. We are particularly interested in instances that the two models disagree on. Downloading the explanations and filtering for disagreements is again only a matter of a few lines, as demonstrated in Fig. 2a.
We derive explanations for the output neuron with the maximum activation. In Fig. 2b, we observe that the Occlusion (Zeiler and Fergus, 2014) explainer does not attribute much importance to the phrase "can be lost in an instant". This is plausible since the heatmap explains a misclassification: the maximum output activation stands for entailment, but the correct label is contradiction and the phrase certainly is a signal for contradiction. In contrast, in the case of ELECTRA (Fig. 2c) which correctly classified the instance the signal phrase receives much higher importance scores. 1 import thermostat 2 bert = thermostat.load("multi_nli-bert-occ") 3 electra = thermostat.load("multi_nli-electra-occ") Code example that loads two THERMOSTAT datasets. We create a list of instances (disagreement) where the two models (BERT and ELECTRA) do not agree with each other regarding the predicted labels. We then select a demonstrative instance from it.  After we have now demonstrated how to compare models, let us compare explainers across datasets, as done in previous works. Here, we partly replicate the experiments of Neely et al. (2021). The authors compute the rank correlation in terms of Kendall's τ between feature attribution maps. If two explainers mostly agree on the importance rank of input features, the value should be high; low otherwise. The authors find that the Integrated Gradients explainer and the LIME (Ribeiro et al., 2016) explainer have a higher τ value (agree more) for a MultiNLI model (.1794) than when used to rank features for an IMDb-trained classifier (.1050).

Maintenance
However, explainers such as LIME involve several hyperparameters (number of samples, sampling method, similarity kernel, ...) and thus results can deviate for other choices. THERMOSTAT datasets are versioned and for each version a configuration file is checked in that contains the hyperparameter choices.
If new best practices or bugs emerge, e.g. in terms of hyperparameters, an updated dataset can be uploaded in a new version. This increases comparability and replicability.
There is also a seamless integration with Hugging Face's datasets as mentioned above. This is why explanation datasets that are published through datasets can be used in THERMO-STAT directly. When contributing new explanation datasets, users simply add the metadata about the three coordinates and make sure that the fields listed in Fig. 1 are contained. As soon as the dataset is published on the community hub, it can be downloaded and parsed by THERMOSTAT. More details are provided in the repository. 3

Current State
After discussing use and maintenance, we will now present the current state of THERMOSTAT. Please recall that THERMOSTAT is intended to be a continuous, collaborative project.
With the publication of this paper, we contribute the explanations listed in Tab. 1. In total, the compute time for generating the explanations already amounts to more than 10,000 GPU hours; computational resources that the community does not have 90 1 import thermostat 2 from scipy.stats import kendalltau 3 lime, ig = thermostat.load("imdb-bert-lime"), thermostat.load("imdb-bert-lig") 4 lime, ig = lime.attributions.flatten(), ig.attributions.flatten() 5 print(kendalltau(lime, ig)) 6 # KendalltauResult(correlation=0.025657302000906455, pvalue=0.0) Figure 3: Code example for investigating the rank correlation between LIME and (Layer) Integrated Gradients explanations on IMDb + BERT. The analogous calculation of Kendall's τ for MultiNLI is left out for brevity. We simply state the value in the last line.
to invest repeatedly now. 4 Please note that the table is organized around the three coordinates: datasets, models, and explainers, the choices of which we discuss in the following.
We hypothesize that instances that the model did not encounter at training time are more informative than known inputs. This is why we concentrated our computational resources on the respective test splits. In total, these amount to almost 50,000 instances already.

Models
The second coordinate in THERMOSTAT is the model. Currently, five model architectures are included, namely ALBERT (Lan et al., 2020), BERT (Devlin et al., 2019), ELECTRA (Clark et al., 2020), RoBERTa , and XL-Net (Yang et al., 2019). We chose communitytrained fine-tuned classifiers as they are aptly available through the transformers library , several of which are provided by TextAttack (Morris et al., 2020). The repository that we publish can be used to quickly include new models, if they are provided through the transformers library.
Explainers Provided through the Captum library (Kokhlikyan et al., 2020), there are five prominent feature attribution methods included in THERMO-STAT. (Layer) Gradient x Activation (Shrikumar et al., 2017) is an efficient method without hyperparameters. Integrated Gradients (Sundararajan 4 To produce the feature attribution maps, we used up to 24 NVIDIA GPUs in parallel, namely GTX 1080Ti, RTX 2080Ti, RTX 3090, Quadro RTX 6000 and RTX A6000.

Related Work
To the best of our knowledge, the closest work to our contribution is the Language Interpretability Tool (LIT) by Tenney et al. (2020) which offers a graphical interface for exploring saliency maps, counterfactual generation and the visualization of attention and embeddings. We also draw connections to Hoover et al. (2020) and Lal et al. (2021) who developed interfaces for analyzing the attention mechanism and embeddings of Transformer architectures. These again are visualization and analysis tools and as such complementary to the collection of explanations that is our primary contribution. Thus, an interface that bridges THERMO-STAT and LIT, for instance, is an interesting future direction.
There exist libraries, such as Captum (Kokhlikyan et al., 2020) or transformers-interpret (Pierse, 2021), that facilitate the generation of neural explanations. These libraries do not, however, free the user of the computational burden, nor are they easily accessible to non-technical researchers.
As noted in Section 2, the datasets library (Lhoest et al., 2021) functions as the backbone of THERMOSTAT. The novelties our work brings with it are (1) attributions from a variety of models that took over 10,000 GPU hours to compute in total, and (2) the support for explainability-specific visualizations and statistics like heatmap visualization and rank correlation between multiple explainers.
Finally, tangentially related to our work are

AG News
Test 7600 4 LGxA LIG LIME Occ SVS textattack/albert-base-v2-ag-news (ALBERT) textattack/bert-base-uncased-ag-news (BERT) textattack/roberta-base-ag-news (RoBERTa) datasets that supply explanations on top of texts and labels, usually collected from human annotators. e-SNLI (Camburu et al., 2018) probably is the most famous example in this line of work. The reader is referred to Wiegreffe and Marasović (2021) for a concise survey. In contrast to our work, the above mentioned papers present human ground truths instead of machine-generated explanations.

Conclusion
We present THERMOSTAT, an easily accessible data hub containing a large collection of NLP model explanations from prominent and mostly expensive explainers. We demonstrate the ease of access, extensibility and maintainability. New datasets can be added easily. Furthermore, we showcase an accompanying library and outline use cases. Users can compare models and explainers across a variety of datasets. THERMOSTAT democratizes explainability re-search to a certain degree as it mitigates the computational (environmentally and financially) and implementational burden. Machine-generated explanations become accessible to non-technical researchers. Furthermore, comparability and replicability are increased. It becomes apparent when consulting the literature in Section 1 that interpretation beyond classification (e.g. machine translation) is still an open problem (Wallace et al., 2020). Hence, we focus on these four text classification problems that are well-trodden paths.