Rationalization through Concepts

Automated predictions require explanations to be interpretable by humans. One type of explanation is a rationale, i.e., a selection of input features such as relevant text snippets from which the model computes the outcome. However, a single overall selection does not provide a complete explanation, e.g., weighing several aspects for decisions. To this end, we present a novel self-interpretable model called ConRAT. Inspired by how human explanations for high-level decisions are often based on key concepts, ConRAT extracts a set of text snippets as concepts and infers which ones are described in the document. Then, it explains the outcome with a linear aggregation of concepts. Two regularizers drive ConRAT to build interpretable concepts. In addition, we propose two techniques to boost the rationale and predictive performance further. Experiments on both single- and multi-aspect sentiment classification tasks show that ConRAT is the first to generate concepts that align with human rationalization while using only the overall label. Further, it outperforms state-of-the-art methods trained on each aspect label independently.


Introduction
Neural models have become the standard for many tasks, owing to their large performance gains. However, their adoption in decision-critical fields is more limited because of their lack of interpretability, particularly with textual data.
One of the simplest means of explaining predictions of complex models is by selecting relevant input features. Attention mechanisms (Bahdanau et al., 2015) model the selection using a conditional importance distribution over the inputs, but the resulting explanations are noisy (Jain and Wallace, 2019;Pruthi et al., 2020). Multi-head attention (Vaswani et al., 2017) extends attention mechanisms to attend information from different perspec- Figure 1: An illustration of ConRAT. Given a beer review, ConRAT identifies five excerpts that relate to particular concepts of beers (i.e., the explanation), depicted in color, from which it computes the outcome. tives jointly. However, no explicit mechanisms guarantee a logical connection between different views (Voita et al., 2019;Kovaleva et al., 2019). Another line of research includes rationale generation methods (Lei et al., 2016;Chang et al., 2020;Antognini et al., 2021b). If the selected text input features are short and concise -called a rationaleand suffice on their own to yield the prediction, it can potentially be understood and verified against domain knowledge (Chang et al., 2019).
The key motivation for this work arises from the limitations of rationales. Rationalization models strive for one overall selection to explain the outcome by maximizing the mutual information between the rationale and the label. However, useful rationales can be multi-faceted, where each facet relates to a particular "concept" (see Figure 1). For example, users typically justify their opinions of a product by weighing explanations: one for each aspect they care about (Musat and Faltings, 2015).
Inspired by how human reasoning comprises concept-based thinking (Armstrong et al., 1983;Tenenbaum, 1999), we aim to discover, in an unsupervised manner, a set of concepts to explain the outcome with a weighted average, similar to multi-head attention. In this work, we relate concepts to semantically meaningful and consistent excerpts across multiple texts. Unlike topic modeling, where documents are described by a set of latent topics comprising word distributions, our latent concepts relate to text snippets that are relevant for the prediction.
Another motivation for this study is to generate interpretable concepts. The explanation of an outcome should rely on concepts that satisfy the desiderata introduced in Alvarez-Melis and Jaakkola (2018). They should 1. preserve relevant information, 2. not overlap with each other and be diverse, and 3. be human-understandable. Figure 1 shows an example of concepts in the beer domain.
In this work, we present a novel self-explaining neural model: the concept-based rationalizer (Con-RAT) (see Figure 1 and 2). Our new rationalization scheme first identifies a set of concepts in a document and then decides which ones are currently described (binary selection). ConRAT explains the prediction with a linear aggregation of concepts. The model is trained end-to-end, and the concepts are learned in an unsupervised manner. In addition, we design two regularizers that guide Con-RAT to induce interpretable concepts and propose two optional techniques, knowledge distillation and concept pruning, in order to boost the performance further.
We evaluate ConRAT on both single-and multiaspect sentiment classification with up to five target labels. Upon training ConRAT only on the overall aspect, the results show that ConRAT generates concepts that are relevant, diverse, and nonoverlapping, and they also recover human-defined concepts. Furthermore, our model significantly outperforms strong supervised baseline models in terms of predictive and explanation performance.

Related Work
Developing interpretable models is of considerable interest to the broader research community. Researchers have investigated many approaches to improve the interpretability of neural networks.
The first line of research aims at providing posthoc explanations of an already trained model. For example, gradient and perturbation-based methods attribute the decision to important input features (Ribeiro et al., 2016;Sundararajan et al., 2017;Lundberg and Lee, 2017;Shrikumar et al., 2017). Other studies identified the causal relationships between input-output pairs (Alvarez-Melis and Jaakkola, 2017; Goyal et al., 2019). In contrast, our model is inherently interpretable as it directly produces the prediction with an explanation.
Another line of research has developed interpretable models. Quint et al. (2018) extended a variational auto-encoder with a differentiable decision tree. Alaniz and Akata (2019) proposed an explainable observer-classifier framework whose predictions can be exposed as a binary tree. However, these methods have been designed for images only, while our work focuses on text input.
The works most relevant to ours relate to interpretable models from the rationalization field (Lei et al., 2016;Bastings et al., 2019;Yu et al., 2019;Chang et al., 2020;Jain et al., 2020;Paranjape et al., 2020). These methods justify their predictions by selecting rationales (i.e., relevant tokens in the input text). However, they are limited to explain only the prediction with mostly one text span and rely on the assumption that the data have low internal correlations (Antognini et al., 2021b). Chang et al. (2019) extended previous methods to extract an additional rationale in order to counter the prediction. In our work, ConRAT produces multi-faceted rationales and explains the prediction through a linear aggregation of the extracted concepts. However, if we set the number of concepts to one, ConRAT reduces to a special case of a rationale model.

Explanations through Concepts.
Researchers have proposed multiple approaches for concept-based explanations. Kim et al. (2018) designed a post-hoc technique to learn concept activation vectors by relying on human annotations that characterize concepts of interest. Similarly, Bau et al. (2017); Zhou et al. (2018) generated visual explanations for a classifier. Our concepts are learned in an unsupervised manner and not defined a priori.
Few studies have learned concepts on images in an unsupervised fashion. Li et al. (2018) explained predictions based on the similarity of the input to "prototypes" learned during training. Alvarez-Melis and Jaakkola (2018) used an auto-encoder to extract relevant concepts and explain the prediction. Ghorbani et al. (2019) designed an unsupervised concept discovery method to explain trained models. Koh et al. (2020) employed the discovered concepts to predict the target label. Our work's key difference is that we focus on text data, while all these methods treat only image inputs.
To the best of our knowledge, Bouchacourt and Denoyer (2019) is the only study that has proposed a self-interpretable concept-based model for text data using reinforcement learning. It computes the predictions and provides an explanation in terms of the presence or absence of concepts in the input (i.e., text excerpts of variable lengths). However, their method achieves poor overall performance. In addition, it is unclear whether the discovered concepts are interpretable. Conversely, ConRAT is differentiable, clearly outperforms strong models in terms of predictive and explanation performance, and it infers relevant, diverse, non-overlapping, and human-understandable concepts.

Topic Modeling.
Topic models, such as latent Dirichlet allocation (Blei et al., 2003), describe documents with a mixture of latent topics. Each topic represents a word distribution. Some studies combined topic models with recurrent neural models (Dieng et al., 2017;Zaheer et al., 2017). However, the goal of these generative models and the topics remains different than this work's. We aim to build a self-interpretable model that predicts and explains the outcome with latent concepts.
3 Concept-based Rationalizer (ConRAT) Figure 2 depicts the architecture of our proposed self-explaining model: the Concept-based Rationalizer (ConRAT). Let X be a random variable representing a document composed of T words (x 1 , x 2 , . . . , x T ), y the ground-truth label, and K the desired numbers of concepts. 1 Given a document X and a label y, our goal is to explain the predictionŷ by finding a set of K concepts C 1 , . . . , C K that are masked versions of X. ConRAT 1 Our method is easily adapted for regression problems.

Concept
Generator M ! (⋅) Figure 2: The proposed self-explaining model Con-RAT. The model predicts and explainsŷ. Given a document X, the concept generator produces one binary mask per concept. The concept selector decides which concepts are present in the input. The predictor aggregates each selected concept's prediction to computeŷ. learns concepts by maximizing the mutual information between C and y. We guide ConRAT to create separable and consistent concepts via two regularizers to make them human-understandable.

Model Overview
ConRAT is divided into three submodels: a Concept Generator g θ (·), which finds the concepts C 1 , . . . , C K ; a Concept Selector s θ (·), which detects whether a concept C k is present or absent (i.e., s k ∈ {1, 0}) in the input X; and a Predictor f θ (·), which predicts the outcomeŷ based on the concepts C and their presence scores S.

Concept Generation
Inspired by the selective rationalization field (Lei et al., 2016), we define "concept" as a sequence of consecutive words in the input text. Previous studies extracted only one concept C 1 that is sufficient to explain the target variable y. In our work, a major difference is that we aim to find K concepts C 1 , · · · , C K that represent different topics or aspects and altogether explain the target variable y. We interpret the model as being linear in the concepts rather than depending on one overall selection of word. More formally, we define a concept as follows: where M k ∈ S denotes a binary mask, S is a subset of Z T 2 with some constraints (introduced in Section 3.2), and is the element-wise multiplication of two vectors.
We parametrize the binary masks M ∈ Z K×T 2 with the concept generator model g θ (·), based on a bi-directional recurrent neural network. Following previous rationalization research (Yu et al., 2019;Chang et al., 2020), we force g θ (·) to select one chunk of text per concept with a prespecified length ∈ [1, T ]. 2 Instead of predicting the mask M k directly, g θ (·) produces a score for each position t. Then, it samples the start position t * k of the chunk for each C k using the straightthrough Gumbel-Softmax (Maddison et al., 2017;Jang et al., 2017). Finally, we compute M k as follows: where 1 denotes the indicator function. Although the equation is not differentiable, we can employ the straight-through technique (Bengio et al., 2013) and approximate it with the gradient of a causal convolution and a convolution kernel of an all-one vector of length .

Concept Selection
A key objective of ConRAT is to produce semantically consistent and separable concepts. So far, the generator g θ (·) generates K concepts for any input document. However, some documents might mention only a subset of those. Thus, the goal of the concept selector model s θ (·) is to enable ConRAT to ignore absent concepts. Specifically, for each concept C k , the model first computes a concept representation H C k using a standard attention mechanism (Bahdanau et al., 2015) (the tokens whose M k,t = 0 are masked out). Then, we take the dot product of H C k with a weight vector, followed by a sigmoid activation function to induce the log-probabilities of a relaxed Bernoulli distribution (Jang et al., 2017). Finally, we sample the presence score s k ∈ {0, 1} of each concept independently:

Prediction
As inputs, the predictor f θ (·) takes the document X, the masks M , and the presence scores S for all concepts. First, we extract the concepts, which are masked versions of X. Differently than in Equation 1, the concepts are ignored if s k = 0: Second, the model produces the hidden representation h C k with another recurrent neural network, followed by a LeakyReLU activation function (Xu et al., 2015). Then, it computes the logits of y by applying a linear projection for each concept: where W and b are the projection parameters. Finally, f θ computes the final outcome as follows: where α k are model parameters that can be interpreted as the degree to which a particular concept contributes to the final prediction.

Unsupervised Discovery of Concepts
The above formulations integrate the explanation into the outcome computation. However, M k is by definition faithful to the model's inner workings but not comprehensible for the end-user. Following Alvarez-Melis and Jaakkola (2018), we aim the concepts to follow three desiderata:1. Fidelity: they should preserve relevant information, 2. Diversity: they should be non-overlapping and diverse, and 3. Grounding: they should have an immediate human-understandable interpretations. The hard constraint in Equation 2 naturally enforces the grounding by forcing the concept to be a sequence of words. For the fidelity, it is partly integrated in ConRAT by the prediction loss, which is the cross-entropy between the ground-truth label y and the predictionŷ: L pred = CE(ŷ, y). Recall that the concepts are substitutes of the input that are sufficient for the prediction. We emphasize the word "partly" because nothing prevents ConRAT from picking up spurious correlations.
We propose two regularizers to encourage Con-RAT in finding non-overlapping, relevant, and dissimilar concepts. The first favors the orthogonality of concepts by penalizing redundant rows in M : where || · || F stands for the Frobenius norm of a matrix, 1 denotes the identity matrix, and the prespecified concept length. However, L overlap alone does not prevent ConRAT from learning little relevant concepts. Therefore, we propose a second regularizer to encourage fidelity and diversity by minimizing the cosine similarity between the concept representations H C k (see Section 3.1.2): In both regularizers, we do not consider the presence scores S because a model could always select only one concept; this strategy is not optimal and reduces to a special case of rationale models (i.e., S would become a one-hot vector).
To summarize, the concepts are learned in an unsupervised manner and align with the three desiderata mentioned above: diversity is achieved with L overlap and L div ; fidelity is enforced by L pred and L div , and the hard constraint in Equation 2 ensures the grounding. Finally, we train Con-RAT end-to-end and minimize the loss jointly

Improving Overall Performance Further
The purpose of self-explaining models is to compute outcomes while being more interpretable. However, one key point is to achieve predictive performance comparable to that of black-box models. We propose two techniques to further improve both interpretability and performance; however, Con-RAT does not require these techniques to outperform other methods, as we will see later.
Knowledge Distillation. We can train ConRAT not only via the information provided by the true labels but also by observing how a teacher model behaves (Hinton et al., 2015). In that case, we introduce the teacher model T θ (·), which is a simple recurrent neural network similar to the predictor f θ . It is trained one the same data, but it uses the whole input X instead of subsets selected by each C k . The overall training loss becomes L = Pruning Concepts. Depending on the number of concepts and the pre-specified length, the total number of selected words can be close to or higher than the document length. 3 In practice, it is hard to extract meaningful concepts in such settings. To alleviate this problem, we propose to prune concepts at inference and select the top-k concepts that overlap the least with the others. More specifically, we 3 e.g., if a document contains 200 tokens and we aim to extract 10 concepts of 20 tokens, all words should be selected.

Dataset
Amazon Beer # Reviews 24, 000 60, 000 Split Train/Val/Test 20k/2k/2k 50k/5k/5k # Annotations 471 994 # Human Aspects 1 5 # Words per review 224 ± 125 184 ± 58 compute the overlap as follows: for each sample in the validation set, we measure the average overlap ratio between M k 1 and M k 2 for each concept-pair (C k 1 , C k 2 ), k 1 = k 2 . Then, we select the top-k concepts whose scores are the lowest. Finally, to compute the new predictionŷ, we update s k = 1 if C k is in the top-k or s k = 0 otherwise.

Datasets
We evaluate the quantitative performance of Con-RAT using two binary classification datasets. The first one is the single-aspect Amazon Electronics dataset (Ni et al., 2019). We followed the filtering process in Chang et al. (2019) to keep only the reviews that contain evidence for both positive and negative sentiments. Specifically, we considered the first 50 tokens after the words "pros:" and "cons:" as the rationale annotations for the positive and negative labels, respectively. We randomly picked 24,000 balanced samples with ratings of four and above or two and below.
The second dataset comprises the multi-aspect beer reviews (McAuley et al., 2012) used in the field of rationalization (Lei et al., 2016;Yu et al., 2019). Each review describes various beer aspects: Appearance, Aroma, Palate, Taste, and Overall; users also provided a five-star rating for each aspect. However, we only use the overall rating for ConRAT. The dataset includes 994 beer reviews with sentence-level aspect annotations. Following the evaluation protocol in Bao et al. (2018); Chang et al. (2020), we binarized the ratings ≤ 2 as negative and ≥ 3 as positive. We sampled 60,000 balanced examples. Our setting is more challenging than those in previous studies because we assess the performance on all aspects (instead of three) and consider all examples for the sampling (instead of de-correlated subsets), reflecting the real data distribution. Table 1 shows the data statistics.

Baselines
We consider the following baselines. RNP is a generator-predictor framework proposed by Lei et al. (2016) for rationalizing neural prediction. The generator selects text spans as rationales, which are then fed to the classifier for the final prediction. Yu et al. (2019) introduced RNP-3P, which extends RNP to include the complement predictor as the third player. It maximizes the predictive accuracy from unselected words. The training consists of an adversarial game with the three players. Intro-3P (Yu et al., 2019) improves RNP-3P by conditioning the generator on the predicted outcome of a teacher model. InvRAT is a game-theoretic method that competitively rules out spurious words with strong correlations to the output. The game-theoretic approach CAR aims to infer a rationale and a counterfactual rationale that counters the true label. We follow Chang et al. (2020) and consider for all methods their hard constraint variant (i.e., selecting one chunk of text) with different lengths for generating rationales.
RNP-3P and Intro-3P are trained with the policy gradient (Williams, 1992). The others estimate the gradients of the rationale selections using the straight-through technique (Bengio et al., 2013).
All rationalization methods, except CAR, strive for a single overall selection (K = 1) to explain the outcome. For the multi-aspect dataset, we train and tune each baseline independently for each aspect. The key difference with ConRAT is that the model is only trained on the overall aspect label and infers one rationale of K concepts; the baselines are trained K times to infer one rationale of one concept.

Experimental Details
To seek fair comparisons, we try to keep a similar number of parameters across all models, and we employ the same architecture for each player (generators, predictors, and discriminators/teachers) in all models: bi-directional gated recurrent units (Chung et al., 2014) with a hidden dimension 256. We use the 100-dimensional GloVe word embeddings (Pennington et al., 2014), Adam (Kingma and Ba, 2015) as optimization method with a learning rate of 0.001. We set the convolutional neural network in the concept selector similarly to (Kim et al., 2015) with 3-, 5-, and 7-width filters and 50 feature maps per filter. For ConRAT, we set the regularizer factors as follow: λ O = 0.05, λ D = 0.05, and ConRAT-2 75.3 33.7 19.4 24.6 8.9 5.1 6.5 λ T = 0.5. We use the open-source implementation for all models, and we tune them by maximizing the prediction accuracy on the dev set with 16 random searches. For reproducibility purposes, we include additional details in Appendix A.

RQ 1: Can ConRAT find evidence for factual and counterfactual rationales?
We aim to validate whether ConRAT can identify the two evidences for positive and negative sentiments. We set the concept length = 30, we compare the generated rationales with the annotations, and we report the precision, recall, and F1 score. In this experiment, no teacher is used in ConRAT. Table 2 contains the results. The top rows contain the results when only the factual rationales are considered for the evaluation, and ConRAT-1 uses only one concept. We see that ConRAT surpasses the baselines in finding rationales that align with human annotations, and it also matches the test accuracy with the baselines. Interestingly, we note that the baselines achieving the highest accuracy underperform in finding the correct rationales.
For the factual and counterfactual rationales, CAR finds one rationale to support the outcome and another one to counter it, in an adversarial game. However, the concepts inferred by ConRAT are not guaranteed to align with the rationales as there is no explicit signal to infer counterfactual concepts. Thus, we increase the number of concepts up to six and prune ConRAT to consider only the two most dissimilar concepts (see Section 3.3).
The bottom of Table 2 show the results. With only two concepts, ConRAT-2 outperforms CAR in Table 3: Objective performance of rationales for the multi-aspect beer reviews. ConRAT only uses the overall label and ignores the other aspect labels. All baselines are trained separately on each aspect rating. Bold and underline denote the best and second-best results, respectively.  terms of test accuracy and matches the performance for the factual rationales, but it poorly identifies counterfactual rationales. However, there is a major improvement when we increase the number of concepts and use pruning. Indeed, the word distribution of the factual and counterfactual rationales are different, hence captured with pruning. Con-RAT's factual rationales are better than those of all models. The counterfactual ones get closer to those produced by CAR. We show later in Section 4.6 that pruning helps in achieving better correlation with human judgments but is not required.

RQ 2: Are concepts inferred by ConRAT consistent with human rationalization?
We investigate whether ConRAT can recover all beer aspects by using only the overall ratings. Because beer reviews are smaller in length than Amazon ones, we set the concept length to 10 and 20. We fix the number of concepts to ten and prune ConRAT to keep five. We manually map them to the closest aspect for comparison. We trained the teacher model, used in Intro-3P and ConRAT, and obtained 91.4% accuracy. More results and illustrations are available in Appendix B and C.
Objective Evaluation. Similar to Section 4.4, we compare the generated rationales with the human annotations on the five aspects and the average performance. The main results are shown in Table 3. On average, ConRAT achieves the best performance while trained only on the overall ratings. This shows that the generated concepts, learned in an unsupervised manner, are separable, consistent, and correlated with human judgments to a certain extent. For the concept length = 20, Con-RAT produces significant superior results for all aspects, whereas the difference with InvRAT is less pronounced for = 10. Finally, ConRAT's concepts lead to the highest accuracy and respect the grounding desideratum, thanks to the teacher. We hypothesize that the baselines underperform due to the high correlations among the aspect ratings. Thus, they are more prone to pick up spurious correlations between the input features and the output. By considering multiple concepts simultaneously, ConRAT reduces the impact of spurious correlations. Regarding Intro-3P and RNP-3P, both suffer from instability issues due to the policy gradient (Chang et al., 2020;Yu et al., 2019).
We visualize an example in Figure 3. We observe that ConRAT induces interpretable concepts, while the best baselines suffer from spurious correlations. By reading our concepts alone, humans will easily predict the aspect label and its polarity.
Subjective Evaluation. We conduct a human evaluation using Amazon's Mechanical Turk (details in Appendix B.2) to judge the understandability of the concepts. Following Chang et al. (2019), we sampled 100 balanced reviews from the holdout set for each aspect, model, and concept length, resulting in 5,000 samples. We showed the examples in random order. An evaluator is presented with the concept generated by one of the five methods (unselected words are not visible). We credit a success when the evaluator guesses the true aspect label and its sentiment. We report the success rate as the performance metric. A random guess has a 10% success rate. Figure 4 shows the main results. Similar to the objective evaluation, ConRAT reaches the best per-ConRAT (Ours) InvRAT (Chang et al., 2020) RNP (Lei et al., 2016) appearance : pours a slightly murky ice tea brown color with a frothy head and some lacing smell : malted milk chocolate and hazelnuts ; rather bready taste : starts with a very clean malty base which turns a bit earthy and coarse in the aftertaste mouthfeel : very smooth but a tad below medium bodied ; moderate carbonation drinkability : a very pleasant scottish that is marked down a bit for its mediocre finish appearance : pours a slightly murky ice tea brown color with a frothy head and some lacing smell : malted milk chocolate and hazelnuts ; rather bready taste : starts with a very clean malty base which turns a bit earthy and coarse in the aftertaste mouthfeel : very smooth but a tad below medium bodied ; moderate carbonation drinkability : a very pleasant scottish that is marked down a bit for its mediocre finish appearance : pours a slightly murky ice tea brown color with a frothy head and some lacing smell : malted milk chocolate and hazelnuts ; rather bready taste : starts with a very clean malty base which turns a bit earthy and coarse in the aftertaste mouthfeel : very smooth but a tad below medium bodied ; moderate carbonation drinkability : a very pleasant scottish that is marked down a bit for its mediocre finish formance, followed by InvRAT. Moreover, Con-RAT only requires a single training on the overall aspect. It emphasizes that the discovered concepts satisfy the fidelity and diversity desiderata and better correlate with human judgments compared with supervised baselines.

RQ 3: How does the number of concepts K in ConRAT affect the performance?
We study the impact of the number of concepts K in ConRAT on the performance, as discussed in Section 4.5. We set the number of concepts to the number of aspects (K=5) and then increase it to K=10 and K=20. We prune ConRAT to keep only the five most dissimilar concepts (see Section 3.3). Results are shown in Table 4. First, we observe that the performance is already better than the baselines in Table 3 with K=5. Second, when increasing K and pruning ConRAT, the performance is boosted further. However, we remark that the interpretability of the concepts follows a bell curve and significantly decreases when K=20. One potential reason is that we expect overlaps between the discriminative concepts that relate to beer aspects. 4 Thus, the five most dissimilar concepts might align less with human-defined concepts.

RQ 4: How does each module of ConRAT
contribute to the overall performance?
Finally, we analyze the importance of each module in an ablation study. To avoid any bias from pruning, we set the number of concepts to five. 5 Table 5 shows the results. When ConRAT ignores the overlapping or the diversity regularizer, we observe a large drop in the rationale performance. This is expected as the diversity desideratum is not encouraged anymore. However, we remark that the sentiment prediction accuracy increases, which is certainly caused by spurious correlation with the ground-truth label. When all concepts are considered (s k = 1 ∀k), we note that the sentiment accuracy stays similar. However, the objective performance decreases by 10% for the precision and more than 20% for the recall and F1 score. These results align with prior work: users write opinions about the topics they care about  (Musat and Faltings, 2015;Antognini et al., 2021a). ConRAT reduces the noise at training by selecting concepts described in the current document. Finally, the teacher model helps ConRAT to boost the sentiment accuracy by more than 3% absolute score, without affecting the rationale quality.

Conclusion
Providing explanations for automated predictions carries much more impact, increases transparency, and might even be vital. Previous works have proposed using rationale methods to explain the prediction of a target variable. However, they do not properly capture the multi-faceted nature of useful rationales. We proposed ConRAT, a novel selfexplaining model that extracts a set of concepts and explains the outcome with a linear aggregation of concepts, similar to how humans reason. Our second contribution is two novel regularizers that guide ConRAT to generate interpretable concepts. Experiments on both single-and multiaspect sentiment classification datasets show that ConRAT, by using only the overall label, is the first to provide superior rationale and predictive performance compared with supervised state-of-the-art methods trained for each aspect label. Moreover, ConRAT produces concepts considered superior in interpretability when evaluated by humans.

A Additional Training Details
We tune all models on the dev set. We truncate all reviews to 320 tokens for the beer dataset and 400 tokens for Amazon reviews. We have operated a random search over 16 trials. All baselines, except CAR, are tuned for each aspect (80 trials in total for the five aspects). We chose the models achieving the lowest validation accuracy. Most of the time, all models converged under 30 epochs. The range of hyperparameters are the following for ConRAT (similar for other models): • Learning rate: • Gumbel temperature in s θ (·): [1.0; 1.5]; A.1 Hardware / Software • CPU: 2x Intel Xeon E5-2680 v3, 2x 12 cores, 24 threads, 2.5 GHz, 30 MB cache; • RAM: 16x16GB DDR4-2133; • GPU: 2x Nvidia Titan X Maxwell; • OS: Ubuntu 18.04; • Software: Python 3, PyTorch 1.3, CUDA 10.

B Complementary Results RQ 2 B.1 Objective Evaluation
The results for the concept length = 5 is shown in Table 6. Moreover, we report in Table 7 the performance for the unsupervised sentiment prediction task for the aspects whose labels are not available to Con-RAT: Appearance, Aroma, Palate, and Taste. As we can see, ConRAT achieves competitive results compared to supervised baselines.

B.2 Human Evaluation Details
We use Amazon's Mechanical Turk crowdsourcing platform to recruit human annotators to evaluate the quality of extracted justifications and the generated justifications produced by each model. To ensure high-quality of the collected data, we restricted the pool to native English speakers from the U.S., U.K., Canada, or Australia. Additionally, we set the worker requirements at a 98% approval rate and more than 1,000 HITS.
The user interface used to judge the quality of the justifications extracted from different methods, in Section 4.5, is shown in Figure 5.

B.3 Subjective Evaluation
All results (for the joint, the aspect, and the polarity accuracy) are shown in Figure 6. In total, we used 7,500 samples (100 × 5 × 5 × 3).
We also studied the error rates on each aspect. The Aroma and Palate aspects cause the highest error for all models. One possible reason is that users confuse these with the aspect Taste, hence their high correlations in rating scores (Antognini et al., 2021b).

C Extra Visualizations
Additional samples of generated rationales are shown in Figure 7 , 8, 9, and 10. We can observe that baselines suffer from spurious correlations: the rationale for the aspect Aroma, Palate, and Taste are often exchanged, or several rationales pick the same text snippets. On the other hand, ConRAT finds better concepts while only trained on the overall aspect label. As it has been shown in prior work (Lei et al., 2016;Chang et al., 2020;Antognini et al., 2021b) rationale methods suffer from the high correlation between rating scores because each model is trained independently for each aspect. Therefore, they rely on the assumption that the data have low internal correlations, which does not reflect the real data distribution. By contrast, ConRAT alleviates this problem be finding all concepts in one training. Table 6: Objective performance of rationales for the multi-aspect beer reviews with the concept length set to five. ConRAT only uses the overall rating and does not have access to the other aspect labels. All baselines are trained separately on each aspect label. Bold and underline denote the best and second-best results, respectively.    Table 7: Performance on the overall sentiment and the aspects whose labels are not available to ConRAT. Bold and underline denote the best and second-best results.

ConRAT (Ours)
InvRAT (Chang et al., 2020) RNP (Lei et al., 2016) on-tap at lagunitas a : the pour is a hazy straw color with an initially fluffy white head that slowly dies down to a thin layer . s : an amazingly funky and tart aroma is present immediately . lots of sour apples , lemons , and maybe some green grapes along with a subtle wood character and a bit of grass . t : the tartness is bright and green , lots of lemons and apples . the oak , grass , wet straw , and mild earthiness give this beer a great funky balance to the sourness . m : the body is somewhat light and crisp , with a great level of effervescence and slight dryness . d : this is an absolutely fantastic beer . i would drink this like nobody 's business if it was more readily available and/or lagunitas was n't such a drive .
on-tap at lagunitas a : the pour is a hazy straw color with an initially fluffy white head that slowly dies down to a thin layer . s : an amazingly funky and tart aroma is present immediately . lots of sour apples , lemons , and maybe some green grapes along with a subtle wood character and a bit of grass . t : the tartness is bright and green , lots of lemons and apples . the oak , grass , wet straw , and mild earthiness give this beer a great funky balance to the sourness . m : the body is somewhat light and crisp , with a great level of effervescence and slight dryness . d : this is an absolutely fantastic beer . i would drink this like nobody 's business if it was more readily available and/or lagunitas was n't such a drive .
on-tap at lagunitas a : the pour is a hazy straw color with an initially fluffy white head that slowly dies down to a thin layer . s : an amazingly funky and tart aroma is present immediately . lots of sour apples , lemons , and maybe some green grapes along with a subtle wood character and a bit of grass . t : the tartness is bright and green , lots of lemons and apples . the oak , grass , wet straw , and mild earthiness give this beer a great funky balance to the sourness . m : the body is somewhat light and crisp , with a great level of effervescence and slight dryness . d : this is an absolutely fantastic beer . i would drink this like nobody 's business if it was more readily available and/or lagunitas was n't such a drive . Figure 7: Examples of generated rationales with = 10 for a beer review. Underline highlights ambiguities.

ConRAT (Ours)
InvRAT (Chang et al., 2020) RNP (Lei et al., 2016) pours out in a opaque dark yellow colour , topped with a large , thick white foam . very cotton-like but fruity and strong aroma of of oranges , peaches and banana with undertones of coriander . it also has some weak vinous accents thick and wheaty flavour of cloves , banana , apricots and oranges . thick , full and round mouthfeel . quite tart in the back of the throat bananas in the long velvetly soft finish with hoppy note from orange-peels . a wonderfull winter wheat , too bad it was only 5000 bottles made pours out in a opaque dark yellow colour , topped with a large , thick white foam . very cotton-like but fruity and strong aroma of of oranges , peaches and banana with undertones of coriander . it also has some weak vinous accents thick and wheaty flavour of cloves , banana , apricots and oranges . thick , full and round mouthfeel . quite tart in the back of the throat bananas in the long velvetly soft finish with hoppy note from orange-peels . a wonderfull winter wheat , too bad it was only 5000 bottles made pours out in a opaque dark yellow colour , topped with a large , thick white foam . very cotton-like but fruity and strong aroma of of oranges , peaches and banana with undertones of coriander . it also has some weak vinous accents thick and wheaty flavour of cloves , banana , apricots and oranges . thick , full and round mouthfeel . quite tart in the back of the throat bananas in the long velvetly soft finish with hoppy note from orange-peels . a wonderfull winter wheat , too bad it was only 5000 bottles made Figure 8: Examples of generated rationales with = 10 for a beer review. Underline highlights ambiguities.

ConRAT (Ours)
InvRAT (Chang et al., 2020) RNP (Lei et al., 2016) a : pours a clear dark amber colour . with a thick two finger creamy off white head . settles to a small cap . leaves quite a bit of lacing . s : caramel malt with a grainy smell . also a bit of a fruity smell closer to dark fruits t : caramel malt up front with a grainy taste . then it finishes with a more sweet dark fruity taste . finishes dry . m : medium carbonation with a medium body d : it 's a decent beer . nothing great but it gets the job done if you enjoy this style .
a : pours a clear dark amber colour . with a thick two finger creamy off white head . settles to a small cap . leaves quite a bit of lacing . s : caramel malt with a grainy smell . also a bit of a fruity smell closer to dark fruits t : caramel malt up front with a grainy taste . then it finishes with a more sweet dark fruity taste . finishes dry . m : medium carbonation with a medium body d : it 's a decent beer . nothing great but it gets the job done if you enjoy this style .
a : pours a clear dark amber colour . with a thick two finger creamy off white head . settles to a small cap . leaves quite a bit of lacing . s : caramel malt with a grainy smell . also a bit of a fruity smell closer to dark fruits t : caramel malt up front with a grainy taste . then it finishes with a more sweet dark fruity taste . finishes dry . m : medium carbonation with a medium body d : it 's a decent beer . nothing great but it gets the job done if you enjoy this style . Figure 9: Examples of generated rationales with = 20 for a beer review. Underline highlights ambiguities.

ConRAT (Ours)
InvRAT (Chang et al., 2020) RNP (Lei et al., 2016) beer review 100 a -pours a light somewhat hazy gold color into my pint glass with about one finger of head moderate retention and very nice lacing . s -strong aroma of hops , pine and grapefruit citrus notes as well as sweet malts . t -to me , this is a great tasting ipa . sweet malts , followed by a very nice pine and citrus hop fusion that finishes with just the right amount of bitterness m -medium in body , crisp and refreshing . d -this drinks great as an ipa , and all you hopheads out there like myself remember this is an ipa , not a double or imperial , and for the category it 's in it is an awesome beer beer review 100 a -pours a light somewhat hazy gold color into my pint glass with about one finger of head moderate retention and very nice lacing . s -strong aroma of hops , pine and grapefruit citrus notes as well as sweet malts . t -to me , this is a great tasting ipa . sweet malts , followed by a very nice pine and citrus hop fusion that finishes with just the right amount of bitterness m -medium in body , crisp and refreshing . d -this drinks great as an ipa , and all you hopheads out there like myself remember this is an ipa , not a double or imperial , and for the category it 's in it is an awesome beer beer review 100 a -pours a light somewhat hazy gold color into my pint glass with about one finger of head moderate retention and very nice lacing . s -strong aroma of hops , pine and grapefruit citrus notes as well as sweet malts . t -to me , this is a great tasting ipa . sweet malts , followed by a very nice pine and citrus hop fusion that finishes with just the right amount of bitterness m -medium in body , crisp and refreshing . d -this drinks great as an ipa , and all you hopheads out there like myself remember this is an ipa , not a double or imperial , and for the category it 's in it is an awesome beer Figure 10: Examples of generated rationales with = 20 for a beer review. Underline highlights ambiguities.