SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers

We introduce SelfExplain, a novel self-explaining model that explains a text classifier’s predictions using phrase-based concepts. SelfExplain augments existing neural classifiers by adding (1) a globally interpretable layer that identifies the most influential concepts in the training set for a given sample and (2) a locally interpretable layer that quantifies the contribution of each local input concept by computing a relevance score relative to the predicted label. Experiments across five text-classification datasets show that SelfExplain facilitates interpretability without sacrificing performance. Most importantly, explanations from SelfExplain show sufficiency for model predictions and are perceived as adequate, trustworthy and understandable by human judges compared to existing widely-used baselines.


Introduction
Neural network models are often opaque: they provide limited insight into interpretations of model decisions and are typically treated as "black boxes" (Lipton, 2018). There has been ample evidence that such models overfit to spurious artifacts (Gururangan et al., 2018;McCoy et al., 2019;Kumar et al., 2019) and amplify biases in data (Zhao et al., 2017;Sun et al., 2019). This underscores the need to understand model decision making.
Prior work in interpretability for neural text classification predominantly follows two approaches: (i) post-hoc explanation methods that explain predictions for previously trained models based on model internals, and (ii) inherently interpretable models whose interpretability is built-in and optimized jointly with the end task. While post-hoc methods (Simonyan et al., 2014;Koh and Liang, 2017;Ribeiro et al., 2016) are often the only option 1 Code and data is publicly available at https:// github.com/dheerajrajagopal/SelfExplain

Motivation
The fantastic actors elevated the movie predicted sentiment: positive The fantastic actors elevated the movie fantastic actors (0.7) elevated (0.1)..

Top relevant concepts
Influential training concepts Input Word Attributions

Self-Explain
Figure 1: A sample of interpretable concepts from SELFEXPLAIN for a binary sentiment analysis task. Compared to saliency-map style word attributions, SELFEXPLAIN can provide explanations via concepts in the input sample and the concepts in the training data for already-trained models, inherently interpretable models (Melis and Jaakkola, 2018; Arik and Pfister, 2020) may provide greater transparency since explanation capability is embedded directly within the model (Kim et al., 2014;Doshi-Velez and Kim, 2017;Rudin, 2019).
In natural language applications, feature attribution based on attention scores (Xu et al., 2015) has been the predominant method for developing inherently interpretable neural classifiers. Such methods interpret model decisions locally by explaining the classifier's decision as a function of relevance of features (words) in input samples. However, such interpretations were shown to be unreliable (Serrano and Smith, 2019;Pruthi et al., 2020) and unfaithful (Jain and Wallace, 2019;Wiegreffe and Pinter, 2019). Moreover, with natural language being structured and compositional, explaining the role of higher-level compositional concepts like phrasal structures (beyond individual word-level feature attributions) remains an open challenge. Another known limitation of such feature attribution based methods is that the explanations are limited to the input feature space and often require additional methods (e.g. Han et al., 2020) for providing global explanations, i.e., explaining model decisions as a function of influential training data.
In this work, we propose SELFEXPLAIN-a self explaining model that incorporates both global and local interpretability layers into neural text classifiers. Compared to word-level feature attributions, we use high-level phrase-based concepts, producing a more holistic picture of a classifier's decisions. SELFEXPLAIN incorporates: (i) Locally Interpretable Layer (LIL), a layer that quantifies via activation difference, the relevance of each concept to the final label distribution of an input sample. (ii) Globally Interpretable Layer (GIL), a layer that uses maximum inner product search (MIPS) to retrieve the most influential concepts from the training data for a given input sample. We show how GIL and LIL layers can be integrated into transformer-based classifiers, converting them into self-explaining architectures. The interpretability of the classifier is enforced through regularization (Melis and Jaakkola, 2018), and the entire model is end-to-end differentiable. To the best of our knowledge, SELFEXPLAIN is the first self-explaining neural text classification approach to provide both global and local interpretability in a single model.
Ultimately, this work makes a step towards combining the generalization power of neural networks with the benefits of interpretable statistical classifiers with hand-engineered features: our experiments on three text classification tasks spanning five datasets with pretrained transformer models show that incorporating LIL and GIL layers facilitates richer interpretability while maintaining endtask performance. The explanations from SELFEX-PLAIN sufficiency reflect model predictions and are perceived by human annotators as more understandable, adequately justifying the model predictions and trustworthy, compared to strong baseline interpretability methods.

SELFEXPLAIN
Let M be a neural C-class classification model that maps X → Y, where X are the inputs and Y are the outputs. SELFEXPLAIN builds into M, and it provides a set of explanations Z via highlevel "concepts" that explain the classifier's predictions. We first define interpretable concepts in §2.1. We then describe how these concepts are incorporated into a concept-aware encoder in §2.2. In §2.3, we define our Local Interpretability Layer (LIL), which provides local explanations by assigning rel-evance scores to the constituent concepts of the input. In §2.4, we define our Global Interpretability Layer (GIL), which provides global explanations by retrieving influential concepts from the training data. Finally, in §2.5, we describe the end-to-end training procedure and optimization objectives.

Defining human-interpretable concepts
Since natural language is highly compositional (Montague, 1970), it is essential that interpreting a text sequence goes beyond individual words. We define the set of basic units that are interpretable by humans as concepts. In principle, concepts can be words, phrases, sentences, paragraphs or abstract entities. In this work, we focus on phrases as our concepts, specifically all non-terminals in a constituency parse tree. Given any sequence x = {w i } 1:T , we decompose the sequence into its component non-terminals N (x) = {nt j } 1:J , where J denotes the number of non-terminal phrases in x.
Given an input sample x, M is trained to produce two types of explanations: (i) global explanations from the training data X train and (ii) local explanations, which are phrases in x. We show an example in Figure 1. Global explanations are achieved by identifying the most influential concepts C G from the "concept store" Q, which is constructed to contain all concepts from the training set X train by extracting phrases under each non-terminal in a syntax tree for every data sample (detailed in §2.4). Local interpretability is achieved by decomposing the input sample x into its constituent phrases under each non-terminal in its syntax tree. Then each concept is assigned a score that quantifies its contribution to the sample's label distribution for a given task; M then outputs the most relevant local concepts C L .

Concept-Aware Encoder E
We obtain the encoded representation of our input sequence x = {w i } 1:T from a pretrained transformer model (Vaswani et al., 2017;Liu et al., 2019; by extracting the final layer output as {h i } 1:T . Additionally, we compute representations of concepts, {u j } 1:J . For each non-terminal nt j in x, we represent it as the mean of its constituent word representations where len(nt j ) represents the number of words in the phrase nt j . To represent the root node (S) of the syntax tree, nt S , we use the pooled representation ( Figure 2: Model Architecture: Our architecture comprises a base encoder that encodes the input and its relative non-terminals. GIL then uses MIPS to retrieve the most influential concepts that globally explain the sample, while LIL computes a relevance score for each nt j that quantifies its relevance to predict the label. The model interpretability is enforced through regularization. Examples of top LIL concepts (extracted from the from input) are {the good soup, good}, and of top GIL concepts (from the training data) are {great food, excellent taste} resentation) of the pretrained transformer as u S for brevity. 2 Following traditional neural classifier setup, the output of the classification layer l Y is computed as follows: where g is a relu activation layer, W y ∈ R D×C , and P C denotes the index of the predicted class.

Local Interpretability Layer (LIL)
For local interpretability, we compute a local relevance score for all input concepts {nt j } 1:J from the sample x. Approaches that assign relative importance scores to input features through activation differences (Shrikumar et al., 2017;Montavon et al., 2017) are widely adopted for interpretability in computer vision applications. Motivated by this, we adopt a similar approach to NLP applications where we learn the attribution of each concept to the final label distribution via their activation differences. Each non-terminal nt j is assigned a score that quantifies the contribution of each nt j to the label in comparison to the contribution of the root 2 We experimented with different pooling strategies (mean pooling, sum pooling and pooled [CLS] token representation) and all of them performed similarly. We chose to use the pooled [CLS] token for the final model as this is the most commonly used method for representing the entire input. node nt S . The most contributing phrases C L is used to locally explain the model decisions.
Given the encoder E, LIL computes the contribution solely from nt j to the final prediction. We first build a representation of the input without contribution of phrase nt j and use it to score the labels: Here, s j signifies a label distribution without the contribution of nt j . Using this, the relevance score of each nt j for the final prediction is given by the difference between the classifier score for the predicted label based on the entire input and the label score based on the input without nt j : where r j is the relevance score of the concept nt j .

Global Interpretability layer (GIL)
The Global Interpretability Layer GIL aims to interpret each data sample x by providing a set of K concepts from the training data which most influenced the model's predictions. Such an approach is advantageous as we can now understand how important concepts from the training set influenced the model decision to predict the label of a new input, providing more granularity than methods that use entire samples from the training data for post-hoc interpretability (Koh and Liang, 2017;Han et al., 2020).
We first build a concept store Q which holds all the concepts from the training data. Given model M , we represent each concept candidate from the training data, q k as a mean pooled representation of its constituent words q k = w∈q k e(w) where e represents the embedding layer of M and len(q k ) represents the number of words in q k . Q is represented by a set of {q} 1:N Q , which are N Q number of concepts from the training data. As the model M is finetuned for a downstream task, the representations q k are constantly updated. Typically, we re-index all candidate representations q k after every fixed number of training steps. For any input x, GIL produces a set of K concepts {q} 1:K from Q that are most influential as defined by the cosine similarity function: Taking u S as input, GIL uses dense inner product search to retrieve the top-K influential concepts C G for the sample. Differentiable approaches through Maximum Inner Product Search (MIPS) has been shown to be effective in Question-Answering settings (Guu et al., 2020; to leverage retrieved knowledge for reasoning 3 . Motivated by this, we repurpose this retrieval approach to identify the influential concepts from the training data and learn it end-to-end via backpropagation. Our inner product model for GIL is defined as follows:

Training
SELFEXPLAIN is trained to maximize the conditional log-likelihood of predicting the class at all the final layers: linear (for label prediction), LIL , and GIL . Regularizing models with explanation specific losses have been shown to improve inherently interpretable models (Melis and Jaakkola, 2018) for local interpretability. We extend this idea for both global and local interpretable output for our classifier model. For our training, we regularize the loss through GIL and LIL layers by optimizing their output for the end-task as well.
For the GIL layer, we aggregate the scores over all the retrieved q 1:K as a weighted sum, followed by an activation layer, linear layer and softmax to compute the log-likelihood loss as follows: where the global interpretable concepts are denoted by C G = q 1:K , W u ∈ R D×C , w k ∈ R and g represents relu activation, and l G represents the softmax for the GIL layer.
For the LIL layer, we compute a weighted aggregated representation over s j and compute the log-likelihood loss as follows: . To train the model, we optimize for the following joint loss, Here, α and β are regularization hyper-parameters. All loss components use cross-entropy loss based on task label y c .  Datasets: We evaluate our framework on five classification datasets: (i) SST-2 4 Sentiment Classification task (Socher et al., 2013): the task is to predict the sentiment of movie review sentences as a binary classification task. (ii) SST-5 5 : a finegrained sentiment classification task that uses the  Table 2: Performance comparison of models with and without GIL and LIL layers. All experiments used the same encoder configurations. We use the development set for SST-2 results (test set of SST-2 is part of GLUE benchmark) and test sets for -SST-5, TREC-6, TREC-50 and SUBJ α, β = 0.1 for all the above settings. same dataset as before, but modifies it into a finergrained 5-class classification task. (iii) TREC-6 6 : a question classification task proposed by Li and Roth (2002), where each question should be classified into one of 6 question types. (iv) TREC-50: a fine-grained version of the same TREC-6 question classification task with 50 classes (v) SUBJ: subjective/objective binary classification dataset (Pang and Lee, 2005). The dataset statistics are shown in Table 1. We incorporate SELFEXPLAIN into RoBERTa and XLNet, and use the above encoders without the GIL and LIL layers as the baselines. We generate parse trees (Kitaev and Klein, 2018) to extract target concepts for the input and follow same pre-processing steps as the original encoder configurations for the rest. We also maintain the hyperparameters and weights from the pre-training of the encoders. The architecture with GIL and LIL modules are fine-tuned on datasets described in §3. For the number of global influential concepts K, we consider two settings K = 5, 10. We also perform hyperparameter tuning on α, β = {0.01, 0.1, 0.5, 1.0} and report results on the best model configuration. All models were trained on an NVIDIA V-100 GPU.

Dataset and Experiments
Classification Results : We first evaluate the utility of classification models after incorporating 6 https://cogcomp.seas.upenn.edu/Data/QA/QC/ GIL and LIL layers in Table 2. Across the different classification tasks, we observe that SELFEX-PLAIN-RoBERTa and SELFEXPLAIN-XLNet consistently show competitive performance compared to the base models except for a marginal drop in TREC-6 dataset for SELFEXPLAIN-XLNet.
We also observe that the hyperparameter K did not make noticeable difference. Additional ablation experiments in Table 3

Explanation Evaluation
Explanations are notoriously difficult to evaluate quantitatively . A good model explanation should be (i) relevant to the current input and predictions and (ii) understandable to humans (DeYoung et al., 2020;Jacovi and Goldberg, 2020;. Towards this, we evaluate whether the explanations along the following diverse criteria: • Sufficiency -Do explanations sufficiently reflect the model predictions? • Plausibility -Do explanations appear plausible and understandable to humans?
• Trustability -Do explanations improve human trust in model predictions?
From SELFEXPLAIN, we extracted (i) Most relevant local concepts: these are the top ranked phrases based on r(nt) 1:J from the LIL layer and (ii) Top influential global concepts: these are the most influential concepts q 1:K ranked by the output of GIL layer as the model explanations to be used for evaluations.

Do SELFEXPLAIN explanations reflect predicted labels?
Sufficiency aims to evaluate whether model explanations alone are highly indicative of the predicted label (Jacovi et al., 2018;. "Faithfulness-by-construction" (FRESH) pipeline  is an example of such framework to evaluate sufficiency of explanations: the sole explanations, without the remaining parts of the input, must be sufficient for predicting a label. In FRESH, a BERT (Devlin et al., 2019) based classifier is trained to perform a task using only the extracted explanations without the rest of the input. An explanation that achieves high accuracy using this classifier is indicative of its ability to recover the original model prediction.
We evaluate the explanations on the sentiment analysis task. Explanations from SELFEX-PLAIN are incorporated to the FRESH framework and we compare the predictive accuracy of the explanations in comparison to baseline explanation methods. Following , we use the same experimental setup and saliency-based baselines such as attention (Lei et al., 2016;Bastings et al., 2019) and gradient (Li et al., 2016) based explanation methods. From Table 4 7 , we observe that SELFEXPLAIN explanations from LIL and GIL show high predictive performance compared to all the baseline methods. Additionally, GIL explanations outperform full-text (an explanation that uses all of the input sample) performance, which is often considered an upper-bound for span-based explanation approaches. We hypothesize that this is because GIL explanation concepts from the training data are very relevant to help disambiguate the input text. In summary, outputs from SELFEX-PLAIN are more predictive of the label compared to prior explanation methods indicating higher sufficiency of explanations.

Are SELFEXPLAIN explanations plausible and trustable for humans?
Human evaluation is commonly used to evaluate plausibility and trustability. To this end, 14 human judges 8 annotated 50 samples from the SST-2 validation set of sentiment excerpts (Socher et al., 2013). Each judge compared local and global explanations produced by the SELFEX-PLAIN-XLNet model against two commonly used interpretability methods (i) Influence functions (Han et al., 2020) for global interpretability and (ii) Saliency detection (Simonyan et al., 2014) for local interpretability. We follow a setup discussed in Han et al. (2020). Each judge was provided the evaluation criteria (detailed next) with a corresponding description. The models to be evaluated were anonymized and humans were asked to rate them according to the evaluation criteria alone. Following Ehsan et al. (2019), we analyse the plausibility of explanations which aims to understand how users would perceive such explanations if they were generated by humans. We adopt two criteria proposed by Ehsan et al. (2019): Adequate justification : Adequately justifying the prediction is considered to be an important criteria for acceptance of a model (Davis, 1989). We evaluate the adequacy of the explanation by top-K concepts thresholding at 20% of input 8 Annotators are graduate students in computer science.
Sample P C Top relevant phrases from LIL Top influential concepts from GIL the iditarod lasts for daysthis just felt like it did . neg for days exploitation piece, heart attack corny, schmaltzy and predictable, but still manages to be kind of heart warming, nonetheless. pos corny, schmaltzy, of heart successfully blended satire, spell binding fun suffers from the lack of a compelling or comprehensible narrative . neg comprehensible, the lack of empty theatres, tumble weed the structure the film takes may find matt damon and ben affleck once again looking for residuals as this officially completes a good will hunting trilogy that was never planned .
pos the structure of the film bravo, meaning and consolation Table 5: Sample output from the model and its corresponding local and global interpretable outputs SST-2 (P C stands for predicted class) (some input text cut for brevity). More qualitative examples in appendix §A.2 asking human judges: "Does the explanation adequately justifies the model prediction?" Participants deemed explanations that were irrelevant or incomplete as less adequately justifying the model prediction. Human judges were shown the following: (i) input, (ii) gold label, (iii) predicted label, and (iv) explanations from baselines and SELFEX-PLAIN. The models were anonymized and shuffled. Figure 3 (left) shows that SELFEX-PLAIN achieves a gain of 32% in perceived adequate justification, providing further evidence that humans perceived SELFEXPLAIN explanations as more plausible compared to the baselines.  Understandability: An essential criterion for transparency in an AI system is the ability of a user to understand model explanations . Our understandability metric evaluates whether a human judge can understand the explanations presented by the model, which would equip a non-expert to verify the model predictions. Human judges were presented (i) the input, (ii) gold label, (iii) sentiment label prediction, and (iv) explanations from different methods (baselines, and SELFEXPLAIN), and were asked to select the explanation that they perceived to be more understandable. Figure 3 (right) shows that SELFEX-PLAIN achieves 29% improvement over the bestperforming baseline in terms of understandability of the model explanation.
Trustability: In addition to plausibility, we also evaluate user trust of the explanations (Singh et al., 2019;Jin et al., 2020). To evaluate user trust, We follow the same experimental setup as Singh et al. (2019) and Jin et al. (2020) to compute the mean trust score. For each data sample, subjects were shown explanations and the model prediction from the three interpretability methods and were asked to rate on a Likert scale of 1-5 based on how much trust did each of the model explanations instill. Figure 4 shows the mean-trust score of SELFEX-PLAIN in comparison to the baselines. We observe from the results that concept-based explanations are perceived more trustworthy for humans. Table 5 shows example interpretations by SELF-EXPLAIN; we show some additional analysis of explanations from SELFEXPLAIN 9 in this section.

Analysis
Does SELFEXPLAIN's explanation help predict model behavior? In this setup, humans are presented with an explanation and an input, and must correctly predict the model's output (Doshi-Velez and Kim, 2017;Lertvittayakumjorn and Toni, 2019;Hase and Bansal, 2020). We randomly selected 16 samples spanning equal number of true positives, true negatives, false positives and false negatives from the dev set. Three human judges were tasked to predict the model decision with and without the presence of model explanation. We observe that when users were presented with the explanation, their ability to predict model decision improved by an average of 22%, showing that with SELFEXPLAIN's explanations, humans could better understand model's behavior.
Performance Analysis: In GIL, we study the performance trade-off of varying the number of retrieved influential concepts K. From a performance perspective, there is only marginal drop in moving from the base model to SELFEXPLAIN model with both GIL and LIL (shown in Table 6). From our experiments with human judges, we found that for sentence level classification tasks K = 5 is preferable for a balance of performance and the ease of interpretability. 9 additional analysis in appendix due to space constraints 2.20 1.07x Table 6: Effect of K from GIL. We use SELFEXPLAIN-XLNet on SST-2 for this analysis. *K=1/5/10 did not show considerable difference among them LIL-GIL-Linear layer agreement: To understand whether our explanations lead to predicting the same label as the model's prediction, we analyze whether the final logits activations on the GIL and LIL layers agree with the linear layer activations. Towards this, we compute an agreement between label distributions from GIL and LIL layers to the distribution of the linear layer.
Our LIL-linear F1 is 96.6%, GIL-linear F1 100% and GIL-LIL-linear F1 agreement is 96.6% for SELFEXPLAIN-XLNet on the SST-2 dataset. We observe that the agreement rates between the GIL , LIL and the linear layer are very high, validating that SELFEXPLAIN's layers agree on the same model classification prediction, showing that GIL and LIL concepts lead to same predictions.
Are LIL concepts relevant? For this analysis, we randomly selected 50 samples from SST2 dev set and removed the top most salient phrases ranked by LIL. Annotators were asked to predict the label without the most relevant local concept and the accuracy dropped by 7%. We also computed the SELFEXPLAIN-XLNet classifier's accuracy on the same input and the accuracy dropped by ∼14%. 10 This suggests that LIL captures relevant local concepts. 11 Stability: do similar examples have similar explanations? Melis and Jaakkola (2018) argue that a crucial property that interpretable models need to address is stability, where the model should be robust enough that a minimal change in the input should not lead to drastic changes in the observed interpretations. We qualitatively analyze this by measuring the overlap of SELFEXPLAIN's extracted concepts for similar examples. Table 8 shows a representative example in which minor variations in the input lead to differently ranked 10 Statistically significant by Wilson interval test. 11 Samples from this experiment are shown in §A.3.

Input
Top LIL interpretations Top GIL interpretations it 's a very charming and often affecting journey often affecting, very charming scenes of cinematic perfection that steal your heart away, submerged, that extravagantly it ' s a charming and often affecting journey of people of people, charming and often affecting scenes of cinematic perfection that steal your heart away, submerged, that extravagantly Table 7: Sample (from SST-2) of an input perturbation lead to different local concepts, but global concepts remain stable.
local phrases, but their global influential concepts remain stable.

Related Work
Post-hoc Interpretation Methods: Predominant based methods for post-hoc interpretability in NLP use gradient based methods (Simonyan et al., 2014;Sundararajan et al., 2017;Smilkov et al., 2017). Other post-hoc interpretability methods such as Singh et al. (2019) and Jin et al. (2020) decompose relevant and irrelevant aspects from hidden states and obtain a relevance score. While the methods above focus on local interpretability, works such as Han et al. (2020) aim to retrieve influential training samples for global interpretations. Global interpretability methods are useful not only to facilitate explainability, but also to detect and mitigate artifacts in data (Pezeshkpour et al., 2021;Han and Tsvetkov, 2021).
Inherently Intepretable Models: Heat maps based on attention (Bahdanau et al., 2014) are one of the commonly used interpretability tools for many downstream tasks such as machine translation (Luong et al., 2015), summarization (Rush et al., 2015) and reading comprehension Hermann et al. (2015). Another recent line of work explores collecting rationales (Lei et al., 2016) through expert annotations (Zaidan and Eisner, 2008).  Koh et al., 2020;kuan Yeh et al., 2020). They were recently proposed for computer vision applications, but despite their promise have not yet been adopted in NLP.

Conclusion
In this paper, we propose SELFEXPLAIN, a novel self-explaining framework that enables explanations through higher-level concepts, improving from low-level word attributions. SELFEX-PLAIN provides both local explanations (via relevance of each input concept) and global explanations (through influential concepts from the training data) in a single framework via two novel modules (LIL and GIL), and trainable end-toend. Through human evaluation, we show that our interpreted model outputs are perceived as more trustworthy, understandable, and adequate for explaining model decisions compared to previous approaches to explainability. This opens an exciting research direction for building inherently interpretable models for text classification. Future work will extend the framework to other tasks and to longer contexts, beyond single input sentence.  (2018) argue that a crucial property that interpretable models need to address is stability, where the model should be robust enough that a minimal change in the input should not lead to drastic changes in the observed interpretations. We qualitatively analyze this by measuring the overlap of SELFEXPLAIN's extracted concepts for similar examples. Table 8 shows a representative example in which minor variations in the input lead to differently ranked local phrases, but their global influential concepts remain stable. Table 9 shows some qualitative examples from our best performing SST-2 model.   ________ , costumes , music , cinematography and sound are all astounding given the production 's austere locales .

A.3 Relevant Concept Removal
positive negative we root for ( clara and paul ) , even like them , though perhaps it 's an emotion closer to pity .
we root for ( clara and paul ) ,___________ , though perhaps it 's an emotion closer to pity . positive negative the emotions are raw and will strike a nerve with anyone who 's ever had family trauma .
__________ are raw and will strike a nerve with anyone who 's ever had family trauma . positive negative holden caulfield did it better . holden caulfield __________ . negative positive it 's an offbeat treat that pokes fun at the democratic exercise while also examining its significance for those who take part .
it 's an offbeat treat that pokes fun at the democratic exercise while also examining _________ for those who take part .
positive negative as surreal as a dream and as detailed as a photograph , as visually dexterous as it is at times imaginatively overwhelming .
_______________ and as detailed as a photograph , as visually dexterous as it is at times imaginatively overwhelming .
positive negative holm ... embodies the character with an effortlessly regal charisma .
holm ... embodies the character with ____________ positive negative it 's hampered by a lifetime-channel kind of plot and a lead actress who is out of her depth .
it 's hampered by a lifetime-channel kind of plot and a lead actress who is ____________ .
negative negative Table 10: Samples where the model predictions flipped after removing the most relevant local concept.