A Hierarchical Explanation Generation Method Based on Feature Interaction Detection

The opaqueness of deep NLP models has motivated efforts to explain how deep models predict. Recently, work has introduced hierarchical attribution explanations, which calculate attribution scores for compositional text hierarchically to capture compositional semantics. Existing work on hierarchical attributions tends to limit the text groups to a continuous text span, which we call the connecting rule. While easy for humans to read, limiting the attribution unit to a continuous span might lose important long-distance feature interactions for reflecting model predictions. In this work, we introduce a novel strategy for capturing feature interactions and employ it to build hierarchical explanations without the connecting rule. The proposed method can convert ubiquitous non-hierarchical explanations (e.g., LIME) into their corresponding hierarchical versions. Experimental results show the effectiveness of our approach in building high-quality hierarchical explanations.


Introduction
The opaqueness of deep natural language processing (NLP) models has increased along with their power (Doshi-Velez and Kim, 2017), which has prompted efforts to explain how these "black-box" models work (Sundararajan et al., 2017;Belinkov and Glass, 2019).This goal is usually approached with attribution method, which assesses the influence of inputs on model predictions (Ribeiro et al., 2016;Sundararajan et al., 2017;Chen et al., 2018) Prior lines of work on attribution explanations usually calculate attribution scores for predefined text granularity, such as word, phrase, or sentence.Recently, work has introduced the new idea of hierarchical attribution, which calculates attribution scores for compositional text hierarchically to capture more information for reflecting model predictions (Singh et al., 2018;Tsang et al., 2018;Jin et al., 2019;Chen et al., 2020) As shown in Fig- ure 1, hierarchical attribution produces a hierarchical composition of words, and provides attribution scores for every text group.By providing compositional semantics, hierarchical attribution can give users a better understanding of the model decisionmaking process.(Singh et al., 2018).However, as illustrated in Figure 1, recent work (Singh et al., 2018;Jin et al., 2019;Chen et al., 2020) uses continuous text to build hierarchical attributions, which we call the connecting rule.While consistent with human reading habits, using the connecting rule as an additional prior might lose important long-distance compositional semantics.The concerns are summarized as follows: First, modern NLP models such as BERT (Devlin et al., 2019) and GPT (Radford et al., 2018(Radford et al., , 2019) ) are almost all transformer-based, which use self-attention mechanisms (Vaswani et al., 2017) to capture feature interactions.Since all interactions are calculated in parallel in self-attention mechanism, the connecting rule that only considering neighboring text is incompatible with the basic operation principle of these NLP models.Second, unlike the example in Figure 1, NLP tasks often require joint reasoning of different parts of the input text (Chowdhary, 2020).For example, Figure 2(a) shows an example of natural language interface (NLI) task1 , in which 'has a' and 'avail-  able' are the key compositional semantics to make the prediction: entailment.However, the connecting rule cannot highlight the compositional effect between them because they are not adjacent.Even in relatively simple sentiment classification task, capturing long-distance compositional effect is also necessary.As shown in Figure 2(b), 'courage, is inspiring' is an important combination but not adjacent.
In this work, we introduce a simple but effective method for generating hierarchical explanations without the connecting rule.Moreover, we introduce a novel strategy for detecting feature interactions in order to capture compositional semantics.Unlike earlier hierarchical attribution approaches, which use specific algorithms to calculate attribution scores, the proposed method can convert ubiquitous non-hierarchical explanations (e.g., LIME) into their corresponding hierarchical versions.We build systems based on two classic non-hierarchical methods: LOO (Lipton, 2018) and LIME (Ribeiro et al., 2016), and the experimental results show that both systems significantly outperform existing methods.Furthermore, the ablation experiment additionally reveals detrimental effects of the connecting rule on the construction of hierarchical explanations.Our implementation and genenerated explanations are available at an anonymous website: https://github.com/juyiming/HE_examples.

Method
This section explains the strategy for feature interaction detecting and the algorithm on building hierarchical explanations.

Detecting Feature Interaction
The structure of hierarchical explanations should be informative enough to capture meaningful feature interactions while displaying a sufficiently small subset of all text groups (Singh et al., 2018).Existing work uses different methods to calculate feature interactions for building hierarchical explanations.For example, Jin et al. (2019) uses multiplicative interactions as feature interaction and Chen et al. (2020) uses Shapley interaction index (Fujimoto et al., 2006).
Unlike previous methods, our approach quantifies feature interaction based on the chosen nonhierarchical method.Specifically, given an attribution algorithm Algo, our method measures the influence of one text group on the attribution score of another one.The interaction score between text group g i and g j can be calculate as follows: where Algo −g j (g i ) denotes the attribuition score of g i with g j be marginalized, abs stands for taking the absolute value.Figure 3 shows an example of feature interaction detecting.Non-hierarchical method LIME gives the word 'Buffet' a high attribution score, indicating that it is important for model prediction.This score, however, sharply declines after the word 'buffet' is marginalized, indicating that 'buffet' has a strong impact on 'Buffet' under LIME.Note that in our method, different non-hierarchical attribution methods may lead to different hierarchical structures.Since the calculation principles and even the meaning of scores vary in different attribution methods, this property is more reasonable than building the same hierarchical structures for all attribution methods.(Singh et al., 2018) 31.9 38.3 31.1 39.0 60.5 61.4 59.5 61.1 47.9 HEDGE ♢ (Chen et al., 2020) 34.3 46.7 34.0 44.1 68.2 70.9 68.3 70 42.0 62.4 44.1 61.9 80.1 86.6 83.2 87.3 68.5

Method
Table 1: AOPC(10) and AOPC(20) scores of different attribution methods in on the SST and MNLI datasets.♢ refers to method with hierarchical structure.del and pad refer to different modification strategies in AOPC.

Algorithm 1 Generating Hierarchical Structures
Input: sample text X with length n Initialization: Feature marginalization.The criterion of selecting the feature marginalization approach is to avoid undermining the chosen attribution method.For example, LOO assigns attributions by the probability change on the predicted class after erasing the target text, so we use erasing as the marginalization method.For LIME, which estimates attribution scores by learning a linear approximation, we ignore the sampling points with the target feature during linear fitting.

Building Hierarchical Explanations
Based on the non-hierarchical attribution algorithm Algo, our method builds the hierarchical structure of input text and calculates attribution scores for every text group.Algorithm 1 describes the detail procedure, which recursively chooses two text groups with strongest interaction and merges them into a larger one.X = (x 1 , ..., x n ) denotes model input with n words; g denotes a text group containing a set of words in X; G t denotes the collection of all text groups for the current step t; H X denotes the hierarchical structure of X. G 0 is initialized with each x as a independent text group and H X is initialized as {G 0 }.Then, at each step, text groups with the highest interaction score from G t−1 are merged as on, and G t is add into H X .After n − 1 steps, all words in X will be merged in one group, and H X can constitute the final hierarchical structure of the input text.

Visualization
Clear visualization is necessary for human readability.Since text groups in our hierarchical explanations are not continuous spans, the generated explanations cannot be visualized as a tree structure as shown in Figure 1.To keep clear and informative, the visualization only shows the newly generated unit and its attribution score at each layer.As shown in Figure 4, the bottom row shows the attribution score with each word as a text group (nonhierarchical attributions); The second row indicates {'Buffet'} and {'buffet'} are merged togather as one text group: {'Buffet, buffet'}; Similarly, the fourth row indicates the {'has, a'} and {'availiable'} are merged togather as one text group: {'availiable, has, a'}.

Experiment
We build systems with Leave-one-out (LOO) (Lipton, 2018) and LIME (Ribeiro et al., 2016) as the basic attribution algorithms, denoted as HE loo and HE lime .To reduce processing costs, we limit the maximum number of the hierarchical layers to ten in HE LIM E .

Datasets and Models.
We adopt two text-classification datasets: binary version of Stanford Sentiment Treebank (SST-2) (Socher et al., 2013) and MNLI tasks of the GLUE benchmark (Wang et al., 2019).We use the dev set on SST-2 and a subset with 1,000 samples on MNLI (the first 500 dev-matched samples and the first 500 dev-mismatched samples) for evaluation.We build target models with BERT base (Devlin et al., 2019) as encoder, achieving 91.7% (SST-2) and 83.9% (MNLI) accuracy.

Evaluation Metrics.
Following previous work, we use the area over the perturbation curve (AOPC) to perform quantitative evaluation.By modifying the top k% words, AOPC calculates the average change in the prediction probability on the predicted class as follows: where p(ŷ|) is the probability on the predicted class, x i is modified sample, and N is the number of examples,.Higher AOPCs is better, which means that the words chosen by attribution scores are more important 2 .
We evaluate with two modification strategies del and pad.del modifies the words by deleting them from the original text directly while pad modifies the words by replacing them with <pad> tokens.For hierarchical explanations, we gradually select words to be modified according to attribution scores.If the word number in a text group exceed the number of remaining words to be modified , this text group will be ignored.The detailed algorithm are described in the appendix.

Results Compared to Other Methods
As shown in Table 1, we compare our approach with a number of competitive baselines.Except for LIME, none of other baselines (hierarchical or not) shows a obvious improvement over LOO.
2 Note that because there may be multiple words in a text group in hierarchical explanations, it is impossible to increase the number of perturbed words one at a time until reaching k%.Thus, we directly calculate the change in prediction after perturbing top k% words, which is the same as Chen et al. (2020).In contrast, our LOO-based hierarchical explanations outperform LOO on average by more than 11%.Moreover, our LIME-based hierarchical explanations outperform LIME by 6% on average and achieves the best performance.The experimental results in Table 1 demonstrate the high quality of the generated explanations and the effectiveness of our method in converting non-hierarchical explanations to their corresponding versions.

Results of Ablation Experiment
We conduct an ablation experiment with two special baselines modified from HE LOO : HE-random and HE-adjacent.HE-random merges text groups randomly in each layer; HE-adjacent merges adjacent text groups with the strongest interaction.
As shown in Figure 5, both adjacent and proposed baselines outperform non-hierarchical and random baselines, demonstrating our approach's effectiveness in building hierarchical explanations.Moreover, HE-proposed outperforms HE-adjacent consistently on two datasets, demonstrating the detrimental effects of the connecting rule on generating hierarchical explanations.Note that HE-random on SST-2 slightly outperforms nonhierarchical baseline but has almost no improvement on MNLI.We hypothesize that this is because the input text on SST-2 is relatively short, and thus randomly combined text groups have greater chances of containing meaningful compositional semantics.

Conclusion
In this work, we introduce an effective method for generating hierarchical explanations without the connecting rule, in which a novel strategy is used for detecting feature interactions.The proposed method can convert ubiquitous non-hierarchical explanations into their corresponding hierarchical versions.We build systems based on LOO and LIME.The experimental results demonstrate the effectiveness of proposed approach.

Limitation
Since there is currently no standard evaluation metric for evaluating post-hoc explanations, we use AOPC(k) as the quantitative evaluation metric, which is widely used in the research field.However, because different modification strategies might lead to different evaluation results, AOPC(k) is not strictly faithful for evaluation attribution explanations (Ju et al., 2022), Thus, we evaluate with two modification strategies del and pad and we didn't introduce new strategies to get attribution scores, which avoid the risk of unfair comparisons due to customized modification strategies mentioned in Ju et al. (2022).Even so, there is a risk of unfair comparisons because the AOPC(k) tends to give higher scores to erasure-based explanation methods such as LOO.We don't conduct human evaluation because we believe human evaluation needs a very large scale to guarantee objective and stable, of which we can afford the cost.Thus, we post visualizations of all explanations in our experiment to demonstrate the effectiveness of our approach (https://github.com/juyiming/HE_examples).
well-trained model for experiments.For methods that require sampling, such as LIME and HEDGE, we conduct experiments three times with different random seeds and report the average results.
Different sampling result will lead to instability in LIME attribution scores.Thus, in HE LIM E , when calculating the attribution scores with text group g be marginalized, we will not conduct new sampling, but select samples that does not contain g among the existing sample points.Although this strategy will reduce the sampling points participating in the linear approximation by about half, it ensures the stability of the attribution scores when calculating interaction scores for HE LIM E , which is important for B Experimental Computation Complexity LOO.For LOO, calculate an interaction score between to text groups is comparable to three forward pass through the network.For the step 1, we need to calculate the interaction score between each two groups.In other step, we need to calculate the interaction scores between the new generated group and other groups.In total, we need to calculate ) times, where n refers to the sequence length of the input text.Note that through record the model prediction during every iteration, the computational complexity can be reduced by about half.

LIME.
As described in Section A, we will not conduct new sampling for calculating attribution scores after feature marginalization.To quantifying feature interactions in each layer, we need to perform n linear approximations with n input features, where n refers to the sequence length of the input text.

C Evaluation
For hierarchical explanations, we gradually select words to be modified according to attribution scores.As shown in Algorithm 2, we first determine the number of words that need to be modified, denoted as k.The target set S is the word set to be modified and is initialized as an empty set.Text groups in hierarchical explanations G is sorted according to their attribution scores score from high to low.Then, text groups in G is added to S in order until the number of words in S equals k.If the number of words in a text group is larger than the number of needed words (k subtracts the num-Algorithm 2 Evaluation Algorithm For Hierarchical Explanations Input: the modified word number k, text groups G, attribution scores score Initialize S = {} Sort(G) according to score for each text group g ∈ G do if size(g) <= k − size(S) then S = S g end if end for Output: S ber of words in S), we abandon this text group to guarantee that the number of words in S does not exceed k.For HE LIM E , since the attribution scores at different levels come from multiple linear fitting results, the attribution scores at different levels can not be compare to each other.We evaluate the aopc score of each layer separately and take the best result for HE LIM E .For fair comparison, the best evaluation result of ten times experiments are selected for non-hierarchical LIME.
Note that the maximum number of the hierarchical layers in HE LIM E is limited to ten.Moreover, for the convenience of reading, we also select some short-length examples and put them in the appendix, where positive attribution score indicates supporting the model prediction while negative attribution score indicates opposing model prediction.The visualization of hierarchical attributions show that the proposed approach can not only get obvious improvement on quantitative evaluation but also are easy to read for humans.This article introduces a method for building post-hoc explanations for deep NLP models, using publicly available datasets and models.We believe that there is no potential risk in this method.The artifacts used are well-known and publicly available, such as bert-base.
B3. Did you discuss if your use of existing artifact(s) was consistent with their intended use, provided that it was specified?For the artifacts you create, do you specify intended use and whether that is compatible with the original access conditions (in particular, derivatives of data accessed for research purposes should not be used outside of research contexts)?
The artifacts used are well-known and the consistency between our work and their intended use is obvious.
B4. Did you discuss the steps taken to check whether the data that was collected / used contains any information that names or uniquely identifies individual people or offensive content, and the steps taken to protect / anonymize it?The used datasets SST-2 and MNLI are well-known and have been widely used for many years.Using them will not bring the mentioned risks.
B5. Did you provide documentation of the artifacts, e.g., coverage of domains, languages, and linguistic phenomena, demographic groups represented, etc.?
The artifacts used are well-known and publicly available, such as bert-base.
B6. Did you report relevant statistics like the number of examples, details of train / test / dev splits, etc. for the data that you used / created?Even for commonly-used benchmark datasets, include the number of examples in train / validation / test splits, as these provide necessary context for a reader to understand experimental results.For example, small differences in accuracy on large test sets may be significant, while on small test sets they may not be.The artifacts used are well-known and publicly available, such as bert-base.

Figure 3 :
Figure 3: An example of calculating text interaction.

Figure 4 :
Figure 4: An example of visualization.

Figure 6 :
Figure 6: An example of visualization

Figure 7 :
Figure 7: An example of visualization

Figure 8 :
Figure 8: An example of visualization

Figure 9 :
Figure 9: An example of visualization

Figure 10 :
Figure 10: An example of visualization

Figure 11 :
Figure 11: An example of visualization

A3.
Do the abstract and introduction summarize the paper's main claims?Section: Abstract, Introduction A4.Have you used AI writing assistants when working on this paper?Left blank.B Did you use or create scientific artifacts?Section 3 B1.Did you cite the creators of artifacts you used?Section 3, Section: Experiment Details B2.Did you discuss the license or terms for use and / or distribution of any artifacts?