Lifelong Explainer for Lifelong Learners

Lifelong Learning (LL) black-box models are dynamic in that they keep learning from new tasks and constantly update their parameters. Owing to the need to utilize information from previously seen tasks, and capture commonalities in potentially diverse data, it is hard for automatic explanation methods to explain the outcomes of these models. In addition, existing explanation methods, e.g., LIME, which are computationally expensive when explaining a static black-box model, are even more inefficient in the LL setting. In this paper, we propose a novel Lifelong Explanation (LLE) approach that continuously trains a student explainer under the supervision of a teacher – an arbitrary explanation algorithm – on different tasks undertaken in LL. We also leverage the Experience Replay (ER) mechanism to prevent catastrophic forgetting in the student explainer. Our experiments comparing LLE to three baselines on text classification tasks show that LLE can enhance the stability of the explanations for all seen tasks and maintain the same level of faithfulness to the black-box model as the teacher, while being up to 10ˆ2 times faster at test time. Our ablation study shows that the ER mechanism in our LLE approach enhances the learning capabilities of the student explainer. Our code is available at https://github.com/situsnow/LLE.


Introduction
Explaining a model's predictions to practitioners and end users, especially in the case of a blackboxmodel, is non-trivial. Recent research on eXplainable Artificial Intelligence usually considers feature attribution as a local explanation, i.e., how much each feature contributes to the outcome of the model. Related works include backpropagationbased methods, where the influence of model outcome is backpropagated according to gradients or layer-wise rules (Bach et al., 2015;Sundararajan et al., 2017;Smilkov et al., 2017;Erion et al., 2021); perturbation-based methods, which observe changes in model performance after feature perturbation (Schwab and Karlen, 2019;Kim et al., 2020), or approximate the local decision boundary through perturbed samples (Ribeiro et al., 2016;Lundberg and Lee, 2017); and model-based methods, which train an explainer model by optimizing an explanation-meritorious objective, 1 such as robustness/stability (Lakkaraju et al., 2020;Alvarez-Melis and Jaakkola, 2018) that requires similar examples to have similar explanations. All these methods aim to explain static black-box models, whereas explaining dynamic ones, as in the lifelong learning (LL) (Silver et al., 2013) setting, is under-explored.
We propose a Lifelong Explanation (LLE) approach that learns to explain the outcome of a LL black-box under the supervision of a teacher explanation algorithm. The key challenge in LL is to prevent catastrophic forgetting (McCloskey and Cohen, 1989) of knowledge learnt from preceding tasks while learning from a new task. To prevent this, an Experience Replay (ER) mechanism (Li and Hoiem, 2017) is exploited to replay a small amount of past data in order to maintain performance on all seen tasks. However, the dynamicallychanging black-box model may make the ER of previously generated explanations sub-optimal. We investigate an ER mechanism that replays previously seen examples together with explanations from the teacher produced in the current step. Specifically, we incorporate the ER mechanism into the training of the student explainer, which focuses on the faithfulness of the generated explanations, i.e., how well an explanation aligns with the LL black-box model outcome.
Our empirical results show that the LLE explainer (i) enhances the stability of explanations, (ii) is as faithful to the black-box model as the teacher, and (iii) is faster than the teacher at test time. Our ablation study on ER shows that regenerating the teacher's explanations for past examples significantly improves the faithfulness and stability of the student explanations.

Problem Definition
In this paper, we consider a Lifelong Learning (LL) setting comprising a sequence of text classification tasks {T 1 , T 2 , ..., T T }. Each task T t has its own train/validation/test sets (D t tr , D t va , D t ts ), each of which contains a set of paired examples , where x x x t is the input (e.g., a document), y t ∈ Y t is the true label (e.g., a topic label), Y t denotes the label set in task T t , and n t is the total number of examples in the set. The goal is to train a classifier f θ θ θ which continuously learns and accumulates knowledge from the data in each task T t . Specifically, at an arbitrary step t, f θ θ θ optimizes a loss function: tr . In addition, we require f θ θ θ to remember the preceding knowledge at each step t, so as to maintain its performance on all previous tasks, i.e., T 1 , T 2 , ..., T t−1 . In order to achieve this goal, the classifier f θ θ θ is usually allowed to access a memory that stores a limited number of samples from the previous tasks. The performance measure of f θ θ θt at each step t is: 1 t t j=1 acc f,j , where acc f,j denotes the accuracy of f θ θ θt on task T j .
Lifelong Explanation. To explain a dynamic classifier, as in lifelong learning, we consider a new problem setting, called lifelong explanation, where at each step t, the input consists of a set of paired examples The goal is to output an explanation r r r t i that indicates how much each dimension of x x x t i contributes to the outcome of f θ θ θt (x x x t i ). Our approach consists of building an explainer model g φ φ φ , i.e., the student, under the supervision of a teacher algorithm, i.e., g φ φ φ uses the explanations generated by the teacher as ground truth.
This approach generalizes to dynamic classifiers the learning-to-explain approach in (Situ et al., 2021) for explaining the outcome of static classifiers. Since f θ θ θ keeps updating at each step t, we require the explainer g φ φ φ to be able to explain the updated f θ θ θ , while maintaining its explanationmeritorious performance, viz faithfulness and stability, on the data from tasks T 1 , T 2 , ..., T t−1 .

Lifelong Explanation (LLE)
We now present the training and testing phase of our LLE algorithm (Figure 1) to explain a dynamically-changing black-box classifier.
At time step t in the training phase, we are given a task T t , its training set D t tr and the black-box model f θ θ θt (Figure 1a). We first collect the explanations r r r t i for each input x x x t i in D t tr from a teacher algorithm A. Here, r r r t i contains the features (words) in the input x x x t i that are important for the prediction made by f θ θ θt (x x x t i ). We then train our LLE explainer g φ φ φt with the set of explanations {r r r t i } nt i=1 for all inputs in D t tr according to Algorithm 1 and 2. Training the LLE differs from training the generic LL classifier. Firstly, unlike LL, which predefines task boundaries to determine the memory saving strategy, LLE can simply reuse the same set of memorized examples in LL, and thus is insensitive to this setting; we use sparse experience replay (d'Autume et al., 2019), which replays examples from the memory randomly. Secondly, the generic LL algorithm saves the fixed ground-truth label y in the memory. However, in LLE, for an input in the memory M, the ground-truth explanation at time step t − 1 may differ from the one Algorithm 1 Lifelong Explanation (LLE) 1: f θ θ θ : underlying LL black-box classifier 2: g φ φ φ : explainer model 3: A: teacher explanation method 4: K: numbers of randomly selected examples 5: M: training memory 6: procedure EXPLAINERMODEL(f θ θ θ ) 7: M ← ∅ 8: initialize θ θ θ0 and φ φ φ0 randomly 9: for each incoming task Tt do 10: end for 15: end procedure while a stopping condition is not met do 4: end while 10: return φ φ φ 11: end procedure at time step t, since the black-box f θ θ θ is constantly being updated. Hence, when we train g φ φ φ at time step t (Algorithm 1, line 11), we need to consult the teacher again for the latest explanation (Algorithm 2, lines 5-6). This 'experience replay' approach ensures that g φ φ φ can maintain its explanatory performance on previous examples while learning from new examples. To mitigate catastrophic forgetting, we randomly select a subset of size K t from the current training input D t tr and add it to the memory M (Algorithm 1, line 12).
In the testing phase (Figure 1b), we no longer require the teacher algorithm A as the LLE explainer g φ φ φ has already learnt how to produce explanations for unseen examples at each time step t.

Dataset and Black-Box Model (f θ θ θ )
We randomly select ten tasks from the Amazon Customer Review dataset 2 and fine-tune a pretrained distilled BERT (Sanh et al., 2019) on these tasks, achieving a 97% test accuracy. Details of the dataset, training of f θ θ θ and accuracies appear in Appendices B.1 and B.2. 2 We use the datasets provided by HuggingFace datasets API https://huggingface.co/datasets/amazon_us_reviews.

Teacher Explanation Methods (A)
We chose two existing explanation algorithms, LRP (Bach et al., 2015) and LIME (Ribeiro et al., 2016), as the teachers A in our experiments 3experiments in (Montavon et al., 2018) and (Situ et al., 2021) have shown LRP and LIME to be reliable explanation methods in terms of faithfulness and stability. In terms of efficiency, LRP requires one backpropagation pass through the underlying black-box model, and LIME needs to train a linear surrogate model using examples sampled from the neighbourhood of the instance of interest. LPR is time-consuming when the black-box model is large, and LIME is time consuming when the sample size is large. For LIME, we include two baselines, one with sample size 100 (denoted LIME s ) and another with sample size 1000 (denoted LIME l ), to understand how sample size affects its performance.

Lifelong Explainer (g φ φ φ )
Following the sequence labeling formulation in (Situ et al., 2021), our explainer g φ φ φ takes as input a document x x x and the outcome predicted by f θ θ θ , and outputs a sequence of labels -a label represents the discretized contribution (positive or negative) of a word in x x x to the outcome.
When g φ φ φ learns from LRP, denoted LLE lrp , the ground-truth explanations from LRP are all positive and categorized into high/medium/low positive based on the thresholds of mean ± standard deviation of all attributions of input x x x. When g φ φ φ learns from LIME, the ground-truth explanations from LIME can be greater, equal or lower than zero. Hence, the categories are taken to be positive, neutral and negative respectively.
We use the Fairseq framework (Ott et al., 2019) to implement the explainer model g φ φ φ . Specifically, g φ φ φ is a Transformer encoder (Vaswani et al., 2017) (4 attention heads, 4 blocks) trained with a Stochastic Gradient Descent optimizer and a fixed learning rate (1e-4). For experience replay during training, we randomly select 8 samples from the memory M on top of the existing mini-batch (size 8). We train the LLE models with three random seeds for 50 epochs each and report the average results with the best checkpoints on the validation set.

Performance Metrics
Similarly to (Situ et al., 2021), we compare the faithfulness and stability of explanations produced by our LLE with those produced by the teacher explanation methods A; we also compare the efficiency of the methods.
We measure faithfulness in terms of the ∆log-odds values after masking either the positive or negative contribution words. For an input document x x x at time step t, ∆log-odds is given by: log-odds(p(ŷ|f θ θ θt (x x x))) − log-odds(p(ŷ|f θ θ θt (x x x))) whereŷ = max y∈Y t f θ θ θt (x x x),x x x is obtained by masking the positive or negative important words in x x x, and log-odds(p) = log p 1−p . To measure stability, we first select N (set to 3) most similar test documents to the current test document x x x based on pairwise ngram similarity. We then compute the Intersection over Union (IoU) according to the positive and negative important words in x x x and in each of the similar documents x x x : where L is the discretized label set. If the teacher is LRP, L = {high, low}, and if the teacher is LIME, L = {positive, negative}; v v v x x x is the set of words with output label according to the student explainer g φ φ φt or the corresponding teacher A at time step t.
Efficiency is measured by the average time it takes to produce explanations.
These three metrics are computed for all test sets of tasks seen so far at step t; we report the average values per test sample.

Results
Figures 2 and 3 respectively display the positive ∆log-odds, measured by masking words with positive attributions, and IoU per test document (higher is better) for all tasks seen so far at each time step (the negative ∆log-odds are shown in Appendix C.1). To evaluate faithfulness, we also include a Random baseline, which is the ∆log-odds value obtained by randomly selecting k words in each test sample. 4 When LIME is the teacher, we report the LLE model under the supervision of LIME l only, denoted LLE lime . 4 We omit the Random baseline for stability because the stability of an unfaithful explanation is irrelevant, as shown in Figure 2 and in Figure 6, Appendix C.1. Faithfulness. As seen in Figure 2, student LLE lrp and its teacher LRP are almost identically faithful, while LLE lime never performs significantly worse than LIME l , 5 and performs marginally better than LIME s . We also observe that all methods behave significantly better than the Random baseline. It is worth noting that the LIME family (teacher and student) is consistently and significantly more faithful than the LRP family. In addition, all methods except Random show similar fluctuations in all steps. We hypothesize that both LRP and LIME (and their students) can capture the confidence changes of the underlying black-box f θ θ θ on the examples from tasks seen so far. However, the sampling process in LIME helps capture a smoother local decision boundary than LRP, thus helping it better target the most important features, and thus showing a higher level of faithfulness.
Stability. As shown in Figure 3, students LLE lrp and LLE lime achieve higher stability than their teachers LRP and LIME l respectively. Further, the LRP family outperforms the LIME family, which is in contrast to the trend for faithfulness ( Figure 2). However, LLE lime performs comparably with LRP in most steps, even though its teacher is significantly worse than LRP. This shows that our LLE approach can generate more stable explanations than the teachers while maintaining faithfulness.
Efficiency. Figure 4 shows the processing time of all methods obtained with the same hardware configuration. 6 The size of the black-box model f θ θ θ and the LLE model g φ φ φ are approximately 270MB and 135MB respectively. Given that LRP requires a backward relevance computation per layer in f θ θ θ , and LIME requires multiple forward passes (based on sample size), while LLE requires only one forward pass in g φ φ φ , it is self evident that LLE is significantly faster than all three baselines.
Experience Replay on LLE. We perform an ablation study to understand the significance of ER in LLE. Specifically, for a particular teacher, we train two other LLE models: (i) without ER during training (denoted LLE-No ER), and (ii) using the explanations generated by the teacher algorithm when the black-box model first sees a task (denoted LLE-Old ER; involves removing line 6 in Algorithm 2). Table 1 shows the ∆log-odds results after masking positive attribution words from these two LLE models and the vanilla LLE, all with LRP as the teacher. The faithfulness of the model with the updated teacher explanations (LLE lrp ) is significantly higher than that of the other two LLE variants. Similar observations are obtained in other faithfulness and stability comparisons (Appendix C.2). 6 Intel Xeon Silver 4214R, Quadro RTX 6000, 24GB RAM.

Conclusion and Future Work
We have proposed a Lifelong Explanation (LLE) method that learns from a teacher and leverages an ER mechanism to explain a constantly-changing black-box. Our experimental results show that LLE can improve the stability of a teacher's explanation, and maintain a comparable level of faithfulness, while performing up to two order of magnitudes faster. Our ablation study has shown the effectiveness of ER using most recently generated explanations.
The performance of LLE in LL settings consisting of problems other than classification, e.g., relation extraction, is still under-explored. The evaluation of LLE based on other merits of explanations, such as simulatability (Hase and Bansal, 2020), can also be an interesting research direction. and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10 (7)

B.1 Dataset
The Amazon Product Review dataset consists of customer comments on multiple categories of products. We extract the 'review body' and 'star rating' as the input/output for training the classifier. Further, we combine the positive ratings (4 and 5) and negative ratings (1 and 2) to form a binary classification problem. We select the tasks of Home, Outdoors, Wireless, Music, Books, Office products, Luggage, Sports, Jewellery and Video games from this dataset and use them in this order in all experiments. To ensure the classifier learns balanced information from each task, we randomly select 20,000/2,000/2,000 examples as the train/validation/test set, respectively for each of the ten tasks.

B.2 Training of black-box f θ θ θ
We train the black-box model f θ θ θ using an Adam optimizer (Loshchilov and Hutter, 2019) (0.1 weight decay and 1e-5 learning rate) for one epoch. To prevent catastrophic forgetting, we randomly save training examples of each task into memory M. We maintain a fixed memory size (64 examples) for each task. We randomly replay 64 examples from M after every 800 mini-batches which gives us 1% replay rate. The average test accuracy at each time step, as shown in Figure 5, demonstrates that f θ θ θ maintains the performance on seen tasks while learning from new task.

Appendix C Results
C.1 Negative ∆log-odds for each of the LIME-based models. We can see that LLE lime performs very similar to its teacher LIME l and becomes better than LIME s after seven tasks. We do not compare LRP-based methods here, as LRP considers all words contribute positively to the final prediction.

C.2 Experience Replay on LLE
The effect of using ER is measured using ∆log-odds and IoU in Tables 2 to 5. These experimental results prove that LLE is able to generate better explanations in terms of faithfulness and stability by leveraging the most recent ground-truth (teacher explanations) in ER.