KACE: Generating Knowledge Aware Contrastive Explanations for Natural Language Inference

In order to better understand the reason behind model behaviors (i.e., making predictions), most recent works have exploited generative models to provide complementary explanations. However, existing approaches in NLP mainly focus on “WHY A” rather than contrastive “WHY A NOT B”, which is shown to be able to better distinguish confusing candidates and improve data efficiency in other research fields.In this paper, we focus on generating contrastive explanations with counterfactual examples in NLI and propose a novel Knowledge-Aware Contrastive Explanation generation framework (KACE).Specifically, we first identify rationales (i.e., key phrases) from input sentences, and use them as key perturbations for generating counterfactual examples. After obtaining qualified counterfactual examples, we take them along with original examples and external knowledge as input, and employ a knowledge-aware generative pre-trained language model to generate contrastive explanations. Experimental results show that contrastive explanations are beneficial to fit the scenarios by clarifying the difference between the predicted answer and other possible wrong ones. Moreover, we train an NLI model enhanced with contrastive explanations and achieves an accuracy of 91.9% on SNLI, gaining improvements of 5.7% against ETPA (“Explain-Then-Predict-Attention”) and 0.6% against NILE (“WHY A”).


Introduction
In recent years, pre-trained language models (Devlin et al., 2019;Liu et al., 2019;Yang et al., 2019) have been widely adopted in many tasks of natural language processing (Talmor et al., 2019;Choi et al., 2018;Bowman et al., 2015). However, due to the lack of textual explanations, most downstream models become more complicated and difficult to understand. End users, especially those working in critical domains such as healthcare or online education, become more skeptical and reluctant to adopt or trust them, although these models have been proved to improve the decision-making performance. Therefore, providing faithful textual explanations has become a promising way to overcome the black-box property of neural networks, which has attracted the attention of academia and industrial communities.
Recently, the majority of existing methods (Xu et al., 2020;Cheng et al., 2020;Karimi et al., 2020;Ramamurthy et al., 2020;Atanasova et al., 2020;Kumar and Talukdar, 2020) in natural language processing try to explain the predictions of neural models in a model-intrinsic or model-agnostic (also known as post-hoc) way. While post-hoc models (Chen et al., 2020b;Karimi et al., 2020;Kumar and Talukdar, 2020) provide explanations after making predictions without affecting the overall accuracy, most of them neglect the rationales in inputs and provide textual explanations just in the form of "WHY A". However, we argue that contrastive explanations in the form of "WHY A NOT B" could provide more informative and important clues that are easier to understand and persuade end-users. Moreover, we believe that contrastive explanations could benefit downstream tasks (e.g., NLI), since such kind of explanations contain more helpful information (e.g. relations between rationales) that can be used to improve model performance.
To further enhance the explainability and performance of NLI, we propose a novel textual contrastive explanation generation framework in this paper, which is post-hoc and considers rationales, counterfactual examples, and external knowledge. Specifically, we first identify rationales (i.e., key phrases) from a premise-hypothesis (P-H) pair with Figure 1: The overall workflow of contrastive explanation generation, which contains rationale identification, counterfactual example generation (as described in Figure 2) and selection, and knowledge-aware contrastive explanation generation. In our "WHY A NOT B" paradigm, we will generate explanations for A and each other class B (i.e., we will generate "WHY NOT neutral" and "WHY NOT entailment" in this example). The counterfactual example selection aims to select one most qualified for any other class B.
label A, and then use them as the key perturbations for transforming and generating candidate counterfactual examples. Then we further select one most qualified counterfactual example for any other label B. Note that the acquisition of a qualified counterfactual example of class B is essential to generate a meaningful explanation for "WHY NOT B", otherwise the resultant contrastive explanation will be groundless or useless. After that, we take the selected examples along with the original P-H pair and related external knowledge as input, and finally employ a knowledge-aware pre-trained language model to generate contrastive explanation, which will specify why the prediction label is A rather than B, and clarify the confusions for end-users. Moreover, we train an NLI model enhanced with contrastive explanations and achieve the new stateof-art performance on SNLI.
The contributions of this paper are as follows: • We introduce a novel knowledge-aware contrastive explanation generation framework (KACE) for natural language inference tasks.
• We consider the rationales in inputs and regard them as important perturbations for generating counterfactual examples rather than just discarding them like previous post-hoc work (Hendricks et al., 2018;Cheng et al., 2020).
• We integrate external knowledge with generative pre-trained language model rather than only taking original inputs (Kumar and Taluk-dar, 2020;Rajani et al., 2019) for contrastive explanation generation.
• Experimental results show that knowledgeaware contrastive explanations are able to clarify the difference between predicted class and the others, which help to clarify the confusion of end-users and further improve model performance than "WHY A" explanations 1 .

Task Definition and Overall Workflow
Here, we define the task of contrastive explanation generation for NLI. Given a trained neural network model f with input x and predicted class A, the problem of generating contrastive explanations (CE) to an input x is to specify why x belongs to category/class A rather than B, defined as: In Equation 1, we first identify a set of rationales in given inputs, as described in Section 3.1, and in Equation 2 we generate counterfactual examples with reversal mechanism as presented in Section 3.2. In Equation 3, we take the selected counterfactual example along with original example and external knowledge as input, and employ a knowledge-aware generator to produce contrastive explanation as detailed in Section 3.3.
Long, fascinating, soulful. Never have I been so sad to see ending credits roll.  3 Approach

Rationale Identification
Considering that rationales are important features of an instance, it is essential to regard rationales as key perturbations for counterfactual example generation. In this paper, we formulate rationale identification as a token-level sequence labelling task where 1 indicates a rationale token and 0 indicates a background token. Being similar with (Thorne et al., 2019), we first construct the input sequence for a premise p and a hypothesis h as S p = s Label s P remise s and S h = s Hypothesis s , where s is a special token that separates the components. Let y represent the relation between S p and S h where y ∈ {entailment, contradiction, neutral}. For each instance, we need to identify a subset r of zero or more tokens as rationales from both premise and hypothesis sentences. Both premise and hypothesis are encoded with RoBERTa (Liu et al., 2019), yielding hidden representation H p =[· · · , h p j , · · · ] and H h =[· · · , h h i , · · · ] respectively. As rationalizer is proposed by (Zhao and Vydiswaran, 2021), we follow this work for rationale identification using cross attention to embed the hypothesis (premise) into premise (hypothesis), which is defined as: where a ij denotes the attention score of j th token in premise to the i th token in the hypothesis, L p denotes the length of the premise sentence and W 1 is a trainable parameter matrix. The representation of i th token in the hypothesis, denoted asĥ h i , is created by concatenating its original state representation, max-pooling representation over h p , and the corresponding sum of attention representation from h p . At last, we use a softmax layer with a linear transformation to model the probability of the i th token in S h being a rationale token.

Counterfactual Example Generation
As we have introduced above, counterfactual examples of other classes are of key importance to generate contrastive explanations. In this part, we describe how to generate counterfactual examples.
Given a trained neural network model f , the problem of generating counterfactual example for an instance x is to find a set of examples c 1 , c 2 , ..., c k that lead to a desired prediction y .
The counterfactual examples are explainable and contrastive when they appropriately consider proximity, diversity and validity.
Here, we define a three-part loss function to select qualified counterfactual example:  where λ 1 and λ 2 are hyperparameters for balancing L dist and L div . For generating counterfactual example, the validity term, which ensures the generated counterfactual examples have desired prediction target, is defined as: Meanwhile, the generated examples should be proximal to the original instance as described in (Cheng et al., 2020), which means only a small change needs to be made. We do not expect a big change that transforms a large portion of the original, in which way there will be no difference with merely presenting an example of counter classes and the corresponding explanation will be uninformative or useless. That is, we expect that resultant examples are able to preserve the main content of input while changing domain-related parts.
In this paper, we choose a weighted Heterogeneous Manhattan-Overlay Metric (Wilson and Martinez, 1997) to calculate the distance as follows: where t indicates a rationale. To achieve diversity, we want generated examples to be different from each other. Specifically, we calculate the pairwise distance of a set of counterfactual examples and minimize: After defining the loss function, we use a reversal mechanism to produce counterfactual examples. In the reversal mechanism, we use hypernym and hyponym of tokens in WordNet 2 for perturbation.
For example, as shown in Figure 2, the original premise and hypothesis are "a woman and a young child are making sculptures out of clay" and "a man and a woman painting on canvas", and the label is "contradiction". We find from Word-Net the hypernyms of "making sculptures out of clay" and "painting on canvas" as "doing art" and "making something" respectively. We replace them with their hypernyms to obtain counterfactual examples, and use the model f trained on the original P-H training dataset to predict the resultant examples (Equation 7), and keep those belong to neutral or entailment. After the validity justification, we perform further selection by following Equation 8 and Equation 10, and choose the samples with the smallest loss for neutral and entailment for latter contrastive explanation generation.

Contrastive Explanation Generation
After obtaining qualified counterfactual examples, some work (Cheng et al., 2020;Wachter et al., 2017;Verma et al., 2020) provides them as counterfactual explanation directly. However, since counterfactual examples do not provide explanations explicitly, it could be difficult for users to understand. Hence, in this part, we focus on generating contrastive explanation via knowledge-aware generative language model, which explain "WHY A NOT B" rather than merely "WHY A".
While traditional approach generate explanation with SHAP 3 or LIME 4 , recent work has exploited to use pre-trained generative language models (Radford et al., 2019;Lewis et al., 2020;Raffel et al., 2020). In this paper, we use knowledgeaware pre-trained language model to generate contrastive explanation.
Knowledge Extraction Given selected counterfactual examples and identified rationales, we extract relevant knowledge to enhance the generative language model. We acquire structured knowledge and rationale definitions from ConceptNet 5 and dictionary source 6 separately. For ConceptNet, we extract knowledge with Breadth-First-Search (BFS) algorithm as described in (Ji et al., 2020). For dictionary, we extract the definition of rationales by following (Chen et al., 2020a). After extraction, we concatenate these knowledge for training knowledge-aware explanation generator.

Knowledge-Aware Explanation Generator
For contrastive explanation generation, we divide the "WHY A NOT B" problem into two simple question: 1) why the label of the input belong to A, 2) why the label of the input not belong to B.
In previous study, (Kumar and Talukdar, 2020) proposed a label-specific explanation generator, which fine-tuned GPT2 independently for each label. However, the generator can only produce explanations for "WHY A". For the other part of contrastive explanation, we collect some contrastive explanations annotated by human and use them to fine-tune a "WHY NOT B" generator.
Taking a premise-hypothesis pair x along with the qualified counterfactual example x and extracted knowledge K E as input, which is in the form of s Label s x s x s K E s , our finetuned language model generates explanations that support the corresponding label in a "WHY A NOT B" way. With these explanations, end-users can observe and understand the difference between original input and counterfactual example explicitly.  (Camburu et al., 2018) extend the SNLI dataset to e-SNLI 8 with natural language explanations of the ground truth labels. Annotators were asked to highlight words in the premise and hypothesis pairs which could explain the labels and write a natural language explanation using the highlighted words. In this paper, we use the highlighted words for rationale identification and use the natural language explanation to fine-tune the language model based "WHY A" generator.
IMDB The IMDB dataset (Maas et al., 2011) is a movie reviews dataset for sentiment classification. It contains 25,000 training data and 25,000 test data with movie reviews labeled as positive or negative. In this paper, we use IMDB as a out-of-domain dataset to evaluate if counterfactual examples can improve the robustness of our model.

Evaluation
We are committed to generate contrastive explanations which can distinguish the predicted label and others at semantic level, hence, BLEU (Papineni et al., 2002) score is not a proper way to measure the quality of explanations. That is, it can be better confirmed by manual evaluation. In this work, we use manual evaluation and case study for contrastive explanations quality evaluation. Meanwhile, we use accuracy to measure the effectiveness of generated contrastive explanations on improving model performance in terms of data augmentation (organized in the form of s CE s P remise s Hypothesis s ). Table 1: Different types of explanations, including token-level explanation, e-SNLI explanation and contrastive explanation. The explanation of e-SNLI explains why the label of a given pair is contradiction, while the contrastive explanation specifies why the label is contradiction and not neutral or entailment.

NLI with Explanation Baselines
ETPA (Camburu et al., 2018) propose Explain-Then-Predict-Attention (ETPA) that generates an explanation and then predicts the label with only the generated explanation.
NILE:post-hoc (Kumar and Talukdar, 2020) propose natural language inference over labelspecific explanations (NILE). A premise and hypothesis pair is input to label-specific a candidate explanation generator that generates natural language explanations supporting the corresponding label. The generated explanations are then fed into an explanation processor, which predicts labels using evidence presented in these explanations.
LIREx-base (Zhao and Vydiswaran, 2021) propose LIREx-base that incorporates both a rationale enabled explanation generator and an instance selector to select only relevant, plausible natural language explanations (NLEs) to augment NLI models and evaluate on the standardized SNLI.

Experiment Setting
For rationale identification, we use RoBERTa-base to extract hidden representations and set the learning rate to 2e-5, dropout to 0.02, batch size to 8 and number of epochs to 10. Meanwhile, we use AdamW (Loshchilov and Hutter, 2018) as the optimizer and adopt cross-entropy loss as the loss function. In the counterfactual example generation part, we build a hypernym and hyponym table, and use hypernym and hyponym of tokens in Word-Net for perturbation. In the contrastive explanation generation part, we use GPT-2 as the generative language model for training "WHY A" generator and "WHY NOT B" Generator. For generator, we set the learning rate to 5e-5, adam epsilon to 1e-8, length for generation to 100. Explanation Generation for SNLI In Table 1, we present the inputs of our model, the results of our approach that include token-level explanation (rationales), counterfactual example and generated contrastive explanation, compared with manually annotated explanation and generated "WHY A" explanations by NILE:post-hoc and LIREx-base.

Results And Analysis
Compared with "WHY A" explanations that are simple and lack essential information, the contrastive explanation contains more information such as "making sculptures out of clay is a type of art" and "making sculptures is different from painting on canvas". As shown in Table 1, we provide not only the contrastive explanation but also the identified rationales and reversed counterfactual example for reference.
To quantitatively assess contrastive explanations, we compared our method with LIREx-base and NILE:post-hoc in terms of explanation quality through human evaluation on 100 SNLI test samples. The explanation quality refers to whether an explanation provides enough essential information for a predicted label. As shown in Table 2, contrastive explanations produced by our method have a better quality by obtaining over 2.0% and 9.0% than LIREx-base and NILE:post-hoc . Explanation Enhanced NLI In Table 3, we report the experimental results of our method and other baselines include BERT, SemBERT , CA-MTL (Pilault et al., 2021), NILE:post-hoc (Kumar and Talukdar, 2020) and LIREx-base (Zhao and Vydiswaran, 2021) on SNLI. With contrastive explanations, we are able to improve the performance of both BERT-large and RoBERTa-large. Compared with NILE:posthoc (Kumar and Talukdar, 2020), the same scale BERT-large model with contrastive explanations brings a gain of 0.4% on test, which indicates the knowledge-aware contrastive generator are better than the generator of NILE. Compared with LIREx-base that uses RoBERTa-large (Zhao and Vydiswaran, 2021), the BERT-large model and RoBERTa-large with contrastive explanations bring a gain of 0.3% and 1.0% separately, which suggests contrastive explanations are better than rationale enabled explanation. In general, contrastive explanations can achieve new state-of-art performance and get it closer to human annotation (a gain of 1.1% on BERT-Large). We believe that contrastive explanations contain more helpful information (e.g., relations between rationales, differences between original and counterfactual examples) that can be used to improve model performance.

Ablation Study
We perform ablation studies with BERT-large on the SNLI dataset to evaluate the impacts of different components employed in our method, and report the results in Table 4

Out of Domain Counterfactual Example
In this part, we use the generated counterfactual ex-  Counterfactual example aims to find a minimal change in data that "flips" the model's prediction and is used for explanation. (Wachter et al., 2017) first propose the concept of unconditional counterfactual explanations and a framework to generate counterfactual explanations. (Hendricks et al., 2018) first consider the evidence that is discriminative for one class but not present in another class, and learn a model to generate counterfactual explanations for why a model predicts class A instead of B. In this paper, we focus on counterfactual example generation providing contrastive example for natural language inference.

Post-hoc Explanation Generation
For post-hoc explainable NLP system, we can divide explanations into three types: feature-based, example-based and concept-based. For feature-based explanation, (Ribeiro et al., 2016) propose LIME and (Guidotti et al., 2018) extend LIME by fitting a decision tree classifier to approximate the non-linear model. However, there is no guarantee that they are faithful to the original model. For example-based explanation, (Kim et al., 2016) select both prototypes and criticisms from the original data points. (Wachter et al., 2017) propose counterfactual explanations providing alternative perturbations. For concept-based explanation, (Ghorbani et al., 2019) explains model decisions through concepts that are more understandable to human than individual features or characters. In this paper, we integrate counterfactual example and concepts for contrastive explanation generation.

Natural Language Inference
For natural language inference, (Bowman et al., 2015) propose SNLI which contains samples of premise and hypothesis pairs with human annotations. In order to provide interpretable and robust explanations for model decisions, (Camburu et al., 2018) extend the SNLI dataset with natural language explanations of the ground truth labels, named e-SNLI. For explanation generation in NLI, (Kumar and Talukdar, 2020) propose NILE, which utilizes label-specific generators to produce labels along with explanation. However, (Zhao and Vydiswaran, 2021) find NILE do not take into account the variability inherent in human explanation, and propose LIREx which incorporates a rationale enabled explanation generator. In this paper, we consider generating contrastive explanations in NLI.

Conclusion
In this paper, we focus on knowledge-aware contrastive explanation generation for NLI. We generate counterfactual examples by changing identified rationales of given instances. Afterwards, we extract concepts knowledge from ConceptNet and dictionary to train knowledge-aware explanation generators. We show that contrastive explanations that specify why a model makes prediction A rather than B can provide more faithful information than other "WHY A" explanations. Moreover, contrastive explanations can be used for data augmentation to improve the performance and robustness of existing model. The exploration of contrastive explanation in other NLP tasks (i.e. question answering) and better evaluation metrics for explanation will be performed in the future.

A Appendices
Reported Experimental Results Here, we report some other experimental results for reproduction. We use 2 RTX-6000 GPUs for generator training. For each epoch, it takes 3 hours to fine-tune the contrastive generator. As we set 4 epochs for each "WHY A" generator and "WHY NOT B" generator, it takes 12 hours for each approach. There are 355M parameters in RoBERTa-large, 340M parameters in BERT-large and 345M parameters in GPT2-medium. And our code is based on Pytorch.
The Difference between Counterfactual Example and Contrastive Explanation In this paper, we generate contrastive explanations with qualified counterfactual examples. As counterfactual examples provide example-based explanations, the contrastive explanations provide concept-based explanations and explain "WHY A NOT B". Meanwhile, for end-user, contrastive explanations are easier to understand than counterfactual example, which can integrate external knowledge from knowledge bases.
Common Replaced Words Here, we show some common replaced words in reversal mechanism.
For entailment to neutral, the top 10 removed words are "man, wearing, white, blue,black, shirt, one, young, people, woman", the top 10 inserted words are "people, there, playing, man, person, wearing, outside, two, old, near". For entailment to contradiction, the top 10 removed words are "man, wearing, white, blue,black, two, shirt, one, young,people", the top 10 inserted words are "people, man, woman, playing,no, inside, person, two, wearing, women".
For contradiction to neutral, the top 10 removed words are "wearing, blue, black, man,white, two, red, sitting, young, standing", the top 10 inserted words are "people, playing, man, woman, two, wearing, near, tall, men, old". For contradiction to entailment, the top 10 removed words are "wearing, blue, black, man,white, two, red, shirt, young, one", the top 10 inserted words are "people, there, man, two, wearing,playing, people, men, woman, outside".
For neutral to entailment, the top 10 removed words are "white, wearing, shirt, black,blue, man, two, standing,young, red", the top 10 inserted words are "playing, wearing, man, two, there, woman, people, men, near, person". For neutral to contradiction, the top 10 removed words are "white, man, wearing, shirt,black, blue, two, standing,woman, red", the top 10 inserted words are "woman, man, there, playing,two, wearing, one, men, girl,no".
The Demand For Contrastive Explanation A "contrastive explanation" explains not only why some event A occurred, but why A occurred as opposed to some alternative event B. Some philosophers argue that agents could only be morally responsible for their choices if those choices have contrastive explanations, since they would otherwise be "luck infested". Moreover, if the answer predicted by a well-trained model is A but confusing with B, it is natural for end-users to ask "why the answer is A rather than B". A similar scenario is possible to occur when a child is going to recognize characters or learn other language skills. Therefore, contrastive explanation generation is essential in critical domains.