Civil Rephrases Of Toxic Texts With Self-Supervised Transformers

Platforms that support online commentary, from social networks to news sites, are increasingly leveraging machine learning to assist their moderation efforts. But this process does not typically provide feedback to the author that would help them contribute according to the community guidelines. This is prohibitively time-consuming for human moderators to do, and computational approaches are still nascent. This work focuses on models that can help suggest rephrasings of toxic comments in a more civil manner. Inspired by recent progress in unpaired sequence-to-sequence tasks, a self-supervised learning model is introduced, called CAE-T5. CAE-T5 employs a pre-trained text-to-text transformer, which is fine tuned with a denoising and cyclic auto-encoder loss. Experimenting with the largest toxicity detection dataset to date (Civil Comments) our model generates sentences that are more fluent and better at preserving the initial content compared to earlier text style transfer systems which we compare with using several scoring systems and human evaluation.


Introduction
There are many ways to express our opinions. When we exchange views online, we do not always immediately measure the emotional impact of our message. Even when the opinions expressed are legitimate, well-intentioned and constructive, a poor phrasing may make the conversation go awry (Zhang et al., 2018a). Recently, Natural Language Processing (NLP) research has tackled the problem of abusive language detection by developing accurate classification models that flag toxic (or abusive, offensive, hateful) comments (Davidson  Comments test set and the more civil rephrasing generated by our model. The third example shows that its strategy may involve shifting the original intent, since "republican" is not a non-offensive synonym of "moron". Pavlopoulos et al., 2017;Wulczyn et al., 2017;Gambäck and Sikdar, 2017;Fortuna and Nunes, 2018;Zhang et al., 2018a;Van Hee et al., 2018;Zampieri et al., 2019).
The prospect of healthier conversations, nudged by Machine Learning (ML) systems, motivates the development of Natural Language Understanding and Generation (NLU and NLG) models that could later be integrated in a system suggesting alternatives to vituperative comments before they are posted. A first approach would be to train a text-to-text model (Bahdanau et al., 2014;Vaswani et al., 2017) on a corpus of parallel comments where each offensive comment has a courteous and fluent rephrasing written by a human annotator. However, such a solution requires a large paired labeled dataset, in practice difficult and expensive to collect (see Section 4.5). Consequently, we limit our setting to the unsupervised case where the comments are only annotated in attributes related to toxicity, such as the Civil Comments dataset (Borkan et al., 2019). We summarize our investigations with the following research question: RQ: Can we fine-tune end-to-end a pre-trained text-to-text transformer to suggest civil rephrasings of rude comments using a dataset solely annotated in toxicity?
Answering this question might provide researchers with an engineering proof-of-concept that would enable further exploration of the many complex questions that arise from such a tool being used in conversations. The main contributions of this work are the following: • We addressed for the second time the task of unsupervised civil rephrases of toxic texts, relying for the first time on the Civil Comments dataset, and achieving results that reflect the effectiveness of our model over baselines. • We developed a non-task specific approach (i.e. with no human hand-crafting in its design) that can be generalized and later applied to related and/or unexplored attribute transfer tasks.
While several of the ideas we combine in our model have been studied independently, to the best of our knowledge, no existing unsupervised models combine sequence-to-sequence bi-transformers, transfer learning from large pre-trained models, and self-supervised fine-tuning (denoising autoencoder and cycle consistency). We discuss the related work introducing these tools and techniques in the following section.

Related work
Unsupervised complex text attribute transfer (like civil rephrasing of toxic comments) remains in its early stages, and our particular applied task has only a single antecedent (Nogueira dos Santos et al., 2018). There is a great variety of useful works to tackle the task and this section attempts to summarize the vast majority of these works. We describe below the recent strategies (such as attention mechanisms Bahdanau et al., 2014) that led to significant progress in supervised NLU and NLG tasks. Then, we present the most related lines of work in unsupervised text-to-text tasks.
2.1 Transformers 3 are state-of-the-art architectures in NLP Vaswani et al. (2017) showed that transformer architectures, based on attention mechanisms, achieved state-of-the-art results when applied to supervised Neural Machine Translation (NMT). More generally, transformers have proven capable in various NLP and speech tasks (Dong et al., 2018;Huang et al., 2019;Le et al., 2019;Li et al., 2019). Moreover, transformers benefit from pre-training before being fine-tuned on downstream tasks (Devlin et al., 2019;Dai et al., 2019b;Yang et al., 2019;Conneau and Lample, 2019;Raffel et al., 2019). Subsequent research has adopted uni-transformers in many supervised classification and regression tasks (Devlin et al., 2019)  Unsupervised attribute transfer is the task most related to our work. It mainly focuses on sentiment transfer with standard review datasets (Maas et al., 2011;He and McAuley, 2016;Shen et al., 2017;Li et al., 2018), but also addresses sociolinguistic datasets containing text in various registers (Gan et al., 2017;Rao and Tetreault, 2018) or with different identity markers (Voigt et al., 2018;Prabhumoye et al., 2018;Lample et al., 2019). When paraphrase generation aims at being explicitly attributeinvariant, it is referred as obfuscation or neutralization (Emmery et al., 2018;Xu et al., 2019b;Pryzant et al., 2020). Literary style transfer (Xu et al., 2012;Pang and Gimpel, 2019) has also been tackled by recent work. Here, we apply attribute transfer to a large dataset annotated in toxicity, but we also use the Yelp review dataset from Shen et al. (2017) for comparison purposes (see Section 4).
Initial unsupervised attribute transfer approaches sought to build a shared and attribute-agnostic latent representation encoding for the input sentence, with adversarial training. Then, a decoder, aware of the destination attribute, generated a transferred sentence (Shen et al., 2017;Hu et al., 2017;Fu et al., 2018;Zhang et al., 2018c;Xu et al., 2018;John et al., 2019).
Unsupervised attribute transfer approaches that do not rely on a latent space are also present in literature. Li et al. (2018) assumed that style markers are very local and proposed to delete the tokens most conveying the attribute, before retrieving a second sentence in the destination style. They eventually combined both sentences with a neural network. Lample et al. (2019) Wu et al. (2019a) trained models with reinforcement learning. Dai et al. (2019b) introduced unsupervised training of a transformer called StyleTransformer (ST) with a discriminator network. Our approach differs from these unsupervised attribute transfer models in that they did not either leverage large pre-trained transformers, or train with a denoising objective.
The most similar work to ours is Nogueira dos Santos et al. (2018) who trained for the first time an encoder-decoder rewriting offensive sentences in a non-offensive register with non-parallel data from Twitter (Ritter et al., 2010) and Reddit (Serban et al., 2017). Our approach differs in the following aspects. First, we use transformers pre-trained on a large corpus instead of randomly initialized RNNs for encoding and decoding. Second, their approach involves collaborative classifiers to penalize generation when the attribute is not transferred, while we train end-to-end with a denoising auto-encoder. Even if their model shows high accuracy scores, it suffers from low fluency, with offensive words being often replaced by a placeholder (e.g. "big" instead of "f*cking").
As underlined by Lample et al. (2019), applying Generative Adversarial Networks (GANs) (Zhu et al., 2017) to NLG is not straightforward because generating text implies a sampling operation that is not differentiable. Consequently, as long as text is represented by discrete tokens, loss gradients computed with a classifier cannot be back-propagated without tricks such as the REINFORCE algorithm (He et al., 2016) or the Gumbel-Softmax approximation (Baziotis et al., 2019) which can be slow and unstable. Besides, controlled text generation (Ficler and Goldberg, 2017;Keskar et al., 2019;Le et al., 2019;Dathathri et al., 2020) is a NLG task that consists of a language model conditioned on the attributes of the generated text such as the style. But a major difference with attribute transfer is the absence of a constraint regarding the preservation of the input's content.

Formalization of the attribute text rewriting problem
Let X T and X C be our two non-parallel corpora of comments satisfying the respective attributes "toxic" and "civil". Let X = X T ∪ X C . We aim at learning a parametric function f θ mapping a pair of source sentence x and destination attribute a to a fluent sentence y satisfying a and preserving the meaning of x. In our case, there are two attributes "toxic" and "civil" that we assumed to be mutually exclusive. We denote α(x) to be the attribute of x andᾱ(x) the other attribute (for instance when α(x) = "civil", thenᾱ(x) = "toxic"). Note that f θ (x, α(x)) can simply be x.

Our approach is based on bi-conditional encoder-decoder generation
Our approach is to train an autoregressive (AR) language model (LM) conditioned on both the input text x and the destination attribute a.
We compute f θ with a LM p(y|x, a; θ). As we do not have access to ground-truth targets y, we propose in section 3.3 a training function that we assume to maximize p(y|x, a; θ) if and only if y is a fluent sentence with attribute a and preserving x's content. Additionnaly, we use an AR generating model where inference ofŷ is sequential and the token generated at step t + 1 depends on the tokens generated at previous steps: p(ŷ t+1 |ŷ :t , x, a; θ).
To condition on the input text, we follow the work of Bahdanau et al. allowed us to assume that encoders can output a latent representation z, attending to content rather than on an attribute, with a similar training.
The LM is conditioned on the destination attribute with control codes introduced by Keskar et al. (2019). A control code is a fixed sequence of tokens prepended to the decoder's input s, and supposed to prepare the generation in the space of sentences with the destination attribute a. We define γ(a, s) = concat(c(a), s) where c(a) is the control code of attribute a. η masks tokens randomly with probability 15%. Then, masks are replaced by a random token in the vocabulary with probability 10% or left as a sentinel (a shared mask token) with probability 90%. We train the model as an denoising auto-encoder (DAE), meaning that we minimize the negative log-likelihood The hypothesis is that optimizing the DAE objective teaches the controlled generation to the model. Inspired by an equivalent approach in unsupervised image-to-image style transfer (Zhu et al., 2017), we add a cycle-consistency (CC) objective (Nogueira dos Santos et al., 2018;Edunov et al., 2018;Prabhumoye et al., 2018;Lample et al., 2019;Conneau and Lample, 2019;Dai et al., 2019a): which enforces content preservation in the generated prediction. As the cycle-consistency objective computes a non-differentiable AR pseudopredictionŷ during stochastic gradient descent training, gradients are not back-propagated toθ = θ τ −1 at training step τ .
Finally, the loss function sums the DAE and the CC objectives with weighting coefficients:

The text-to-text bi-transformer architecture
The architectures for the encoder and decoder are uni-transformers. Contrary to Vaswani et al. . Let x ∈ X be the input sequence of token. It is embedded then encoded by the unitransformer encoder: Enc ) z is an aggregate sequence representation for the input. There are different heuristics that can be used to integrate it in the decoder. We considered summing z to the embedding of each token of the uni-transformer decoder's input s since it balances the backpropagation of the signals coming from the original input and from the output being generated in the destination attribute space and it worked well in practice in our experiments.
Plus, the encoder and the decoder unitransformers share the same embedding layer and the LM Head is tied to the embeddings.
Except for the dense layer computing the latent variable z, all parameters are coming from the pretrained bi-transformer published by Raffel et al. (2019). Thus, our DAE and CC objectives finetune T5's parameters and this is why we call our model a conditional auto-encoder text-to-text transfer transformer (CAE-T5).

Datasets
We employed the largest publicly available toxicity detection dataset to date, which was used in the 'Jigsaw Unintended Bias in Toxicity Classification' Kaggle challenge. 4 The 2M comments of the Civil Comments dataset stem from a commenting plugin for independent news sites. They were created from 2015 to 2017 and appeared on approximately 50 English-language news sites across the world. Each of these comments was annotated by crowd raters (at least 3 each) for toxicity and toxicity subtypes (Borkan et al., 2019).
Following the work of Dai et al. (2019a) for the IMDB Movie Review dataset (positive/negative sentiment labels), we constructed a sentence-level version of the dataset. Initially, we fine-tuned a pretrained BERT (Devlin et al., 2019) toxicity classifier on the Civil Comments dataset. Then, we split the comments in sentences with NLTK's sentence tokenizer. 5 Eventually, we created X T (respectively X C ) with sentences whose system-generated toxicity score (using our BERT classifier) is greater than 0.9 (respectively less than 0.1) to increase the dataset's polarity. The test ROC-AUC of the toxicity classifier is 0.98 with a precision of 0.95 and a recall of 0.38. Even with this low recall |X T | is large enough (approx. 90k, see Table 2).
We also conducted a comparison to other style transfer baselines on the Yelp Review Dataset (Yelp), commonly used to compare unsupervised   Table 2 shows statistics for these datasets.

Evaluation
Evaluating a text-to-text task is challenging, especially when no gold pairs are available. Attribute transfer is successful if generated text: 1) has the destination control attribute, 2) is fluent and 3) preserves the content of the input text.

Automatic evaluation
We follow the current approach of the community ( 1. Attribute control: Accuracy (ACC) computes the rate of successful changes in attributes. It measures how well the generation is conditioned by the destination attribute. We predict toxic and civil attributes with the same fine-tuned BERT classifier that pre-processed the Civil Comments dataset (single threshold at 0.5).  based on matching words (e.g., BLEU Papineni et al. (2002)) between the generated prediction and the reference(s) (ref-metric). However, as we do not have these paired samples, we compute a content preservation score between the input and the generated sentences (self-metric). Table 3 shows the BLEU scores (based on exact matches) of three examples rephrased by human annotators (Section 4.5). In the top-most example, BLEU score is high. This is explained by the fact that only 4 words are different between the two texts. In contrast to the first example, the two texts in the second example have only 1 word in common. Thus, the BLEU score is low. Despite the low evaluation, however, the candidate text could have been a valid rephrase of the reference text.

Fluency
The high complexity of our task explains the motivation for a more general quantitative metric between input and generated text, capturing the semantic similarity rather than overlapping tokens.  (2019) proposed to represent sentences as a (weighted) average of their words embeddings before computing the cosine similarity between them. We adopted a similar strategy but we embedded sentences with the pretrained universal sentence encoder (Cer et al., 2018) and call it the sentence similarity score (SIM). The first two sentence pairs of Table 3 have high similarity scores. The rephrasings preserve the original content while not necessarily overlapping much with the original text. However, the last rephrasing does not preserve the initial content and have a low similarity score with its source sentence. As a statistical evidence, the self-SIM score comparing each of the 1,000 test Yelp reviews with their human rewriting is 80.2% whereas the self-SIM score comparing the Yelp review test set to a random derangement of the human references is 36.8%.
We optimised all three metrics because doing otherwise comes at the expense of the remaining metric(s). We aggregated the scores of the three metrics by computing the geometric mean 7 (GM) of ACC, 1/PPL and self-SIM.  2019) and to further confirm the performance of CAE-T5, we hired human annotators on Appen to rate in a blind fashion different models' civil rephrasings of 100 randomly selected test toxic comments, in terms of attribute transfer (Att), fluency (Flu), content preservation (Con) and overall quality (Over) on a Likert scale from 1 to 5. Each rephrasing was annotated by 5 different crowdworkers whose annotation quality is controlled by test questions. If a rephrasing is rated 4 or 5 on Att, Flu and Con then it is "successful" (Suc).

Baselines
We compare the output text that CAE-T5 generates with a selection of unpaired style-transfer models described in Section 2.2 (Shen et al., 2017;Li et al., 2018;Fu et al., 2018;Luo et al., 2019;Dai et al., 2019a). We also compare with Input Masking. It is inspired by an interpretability method called Input Erasure (IE) (Li et al., 2016). IE is used to interpret the decisions of neural models. Initially, words are removed one at a time and the altered texts are then re-classified (i.e., as many re-classifications as the words). Then, all the words that led to a decreased re-classification score (based on a threshold) are returned as the ones most related to the decision of the neural model. Our baseline follows a similar process, but instead of deleting, it uses a pseudo token ('[MASK]') to mask one word at a time. When all the masked texts have been scored by the classifier, the rephrased text is returned, comprising as many masks as the tokens that led to a decreased re-classification score (set to 20% after preliminary experiments). We employed a pre-trained BERT as our toxicity classifier, fine-tuned on the Civil Comments dataset (see Section 4.1). Table 4 shows quantitative results on the Civil Comments dataset. Surprisingly, the perplexity (capturing fluency) of text generated by our model is lower than the perplexity computed on human comments. This can be explained by social media authors of comments expressing an important variability in language formal rules, that is only partially replicated by CAE-T5. Other approaches such as Style-Transformer (ST) and CrossAlignment (CA) have higher accuracy but at a cost of both higher perplexity and lower content preservation, meaning that they are better are discriminating toxic phrases but struggle to rephrase in a coherent manner.

Quantitative comparison to prior work
In Table 5 we compare our model to prior work in attribute transfer by computing evaluation metrics for different systems on the Yelp test dataset. We achieve competitive results with low perplexity while getting good sentiment controlling (above human references). Our similarity though is lower, showing that some content is lost when decoding, hence the latent space does not fully capture the semantics. It is fairer to compare our model to other style transfer baselines on the Yelp dataset since our model is based on sub-word tokenization while the baselines are often based on a limited size pretrained word embedding: many more words from the Civil Comments dataset could be attributed to the unknown token if we want to keep reasonable size vocabulary, resulting in a performance drop.
The human evaluation results shown in Table 6 correlate with the automatic evaluation results.
When considering the aggregated scores (geometric mean, success rate and overall human judgement), our model is ranked first on the Civil Comments dataset and second on the Yelp Review dataset, behind DualRL yet our approach is more stable and therefore easier to train when compared to reinforcement learning approaches. Table 7 shows examples of rephrases of toxic comments automatically generated by our system. The top first two examples emphasize the ability for the model to perform fluent control generation conditioned on both the input sentence and the destination attribute. We present more results showing that we can effectively suggest fluent civil rephrases of toxic comments in the Appendix Table 8. However we observe more failures than in the sentiment  ). We identify three natures of failure:

Qualitative analysis
Supererogation generation does not stop early enough and produces fluent, transferred, related but unnecessary content.
Hallucination conditioning on the initial sentence fails and the model generates fluent but unrelated content.
Position reversal the author's opinion is shifted.
In order to assess the frequency of hallucination and supererogation, we randomly selected 100 toxic comments from the test set and manually labeled the generated sentences with the nonmutually exclusive labels "contains supererogation" and "contains hallucination". We counted on average 17% of generated sentences with surrerogation and 34% of generated sentences showing hallucination (often local). We observe that the longer the input comment, the more prone to hallucination is the generated text.
While supererogation and hallucination can be explained by the probabilistic nature of generation, we assume that position reversal is due to bias in the dataset, where toxic comments are correlated with negative comments. Thus, offensive comments tend to be transferred to supportive comments even though a human being would rephrase attacks as polite disagreements. Interestingly, our model is able to add toxicity in civil comments as shown by the examples in the Appendix Table 10. Even if such an application shows limited interest for online platforms, it is worth warning about its potential misuse.   try reading and be a little more informed about it before you try to make a comment. this is absolutely the most idiotic post i have ever read on all levels.
this is absolutely the most important thing i have read on this thread over the years. trump may be a moron, but clinton is a moron as well.
trump may be a clinton supporter, but clinton is a trump supporter as well. shoot me in the head if you didn't vote for trump.

Discussion
Supervised learning is a natural approach when addressing text-to-text tasks. In our study, we submit the civil rephrasing of toxic comments task to human crowd-sourcing. We randomly sampled 500 sentences from the toxic train set. For each sentence, we asked 5 annotators to rephrase it in a civil way, to assess if the comment was offensive and if it was possible to rewrite it in a way that is less rude while preserving the content. On 2500 answers, we tally 427 examples not flagged as impossible to rewrite and with a rephrasing different from the original sentence. This low 17.1% yield is caused by two main issues. On the one hand, unfortunately not all toxic comments can be reworded in a civil manner so as to express a constructive point of view; severely toxic comments that are solely made of insults, identity attacks, or threats are not "rephrasable". On the other hand, evaluating crowd-workers with test questions and answers is complex. The perplexity being higher on crowdworkers' rephrases than on randomly sampled civil comments raises concerns about the production of human references via crowd-sourcing. The nature of large datasets labeled in toxicity and the lack of incentives for crowd-sourcing civil rephrasing annotation makes it expensive and difficult to train systems in a supervised framework. These limitations motivates unsupervised approaches.
Lastly, the more complex is the unsupervised attribute transfer task, the more difficult is its automatic evaluation. In our case, evaluating whether the attribute is actually transferred requires to train an accurate toxicity classifier. Furthermore, the language model we use to assess the fluency of the generated sentences has some limitations and does not generalize to all varieties of language encountered in social media. Finally measuring the amount of relevant content preserved between the source and generated texts remains a challenging, open research topic. This work is the second one to tackle civil rephrasing to our knowledge and the first one to address it with a fully end-to-end discriminator-free textto-text self-supervised training. CAE-T5 leverages the NLU / NLG power offered by large pre-trained bi-transformers. The quantitative and qualitative analysis shows that ML systems could contribute to some extent to pacify online conversations, even though many generated examples still suffer from critical semantic drift.
In the future, we plan to explore whether the decoding can benefit from NAR generation (Ma et al., 2019;Ren et al., 2020). We are also interested in the recent paradigm shift proposed by Kumar and Tsvetkov (2019), where the generated tokens representation is continuous, allowing more flexibility in plugging attribute classifiers without sampling.  (Kudo and Richardson, 2018) and eventually truncated to a maximum sequence length of 32 for the Yelp dataset and 128 for the processed Civil Comments dataset. The control codes are c(a) = concat(a, ": ") for attributes a ∈ {"positive", "negative"} in the sentiment transfer task and a ∈ {"toxic", "civil"} when we apply to the Civil Comments dataset.

A.1.2 Training details
During training, we apply dropout regularization at a rate of 0.1. We set λ AE = λ CC = 1.0. In preliminary experiments, we observed that λ CC = 0 was preserving little content from the initial sentence and that λ CC = 2 * λ AE was weighting the preservation too much, at the cost of accuracy. Therefore we focused our experiments on λ CC = λ AE . It is a good default setting since we don't have a priori about the balance between fluency, accuracy (enforced with the auto-encoder) and content preservation (enforced with cycle consistency). DAE and back-transfer (in the course of the CC computation) are trained with teacher-forcing; we do not need AR generation since we have access to a target for the decoder's output. Each training step computes the loss on a mini-batch made of 64 sentences sharing the same attribute. Mini-batches of attributes a andā are interleaved. Since the Civil Comments dataset is class imbalanced, we sample comments from the civil class of the training set at each epoch. The optimizer is AdaFactor (Shazeer and Stern, 2018) and we train for 88900 steps for 19 hours on a TPU v2 chip.

A.1.3 Evaluation details
Decoding is greedy. The parametric models used to compute ACC and PPL are 12-layer, 12 headed pre-trained, and fine-tuned uni-transformers with hidden size 768. The BERT classifier is an encoder followed by a sequence classification head and the GPT2 LM is a decoder with a LM head on top. We use the sacrebleu 9 implementation for BLEU and the universal sentence encoder pre-trained by Google to compute SIM 10 .
A.2 CAE-T5 learning algorithm Algorithm 1 and Figure 1 describe the fine-tuning procedure of CAE-T5. H computes the crossentropy.
Algorithm 1:  Input :T5's pre-trained parameters θ 0 , unpaired dataset labelled in toxicity X = X T ∪ X C Output :CAE-T5's fine-tuned parameters Back-propagate gradients through θ Update θ τ by a gradient descent step end Figure 2 illustrates flows through the encoderdecoder model at inference. Figure 3 and Figure 4 detail the guidelines we wrote on the crowdsourcing website Appen 11 , when we asked human crowd-workers to rate automatic rephrasings and to rephrase toxic comments. Contributor level is set to level 3, which corresponds to the highest quality standard. variable z that is then decoded conditioned on the source attribute α(x) with the objective of minimizing the cross entropy between x and the generated textx. Here, generation is not AR since the DAE is trained with teacher forcing. (b) CC: The input x is pseudo-transferred with attributeᾱ(x) with AR decoding because we do not know the ground-truth y. The generated output y is then back-transferred to the original space of sentences with attribute α(x). Back-transfer generation is not AR because we use teacher-forcing here. Thus, we can trivially back-propagate the gradients through f θ (back-transfer) but not through fθ (pseudo-transfer). x andŷ are the input and generated text, a is the destination attribute, z is the latent input representation and γ is the function prepending control codes to the decoder's input. θ represents the model's parameters.

Toxic to Civil
Input and then they need to do what it takes to get rid of this mentally ill bigot! CAE-T5 and then they need to do what they can to get rid of this kind of man that is causing the whole problem! Input do you think any of those idiots vote? CAE-T5 do you think any of those people vote? Input this is just so stupid. CAE-T5 this is just so sad. Input are you nuts? CAE-T5 are you kidding me? Input we are all entitled to our own opinions but not the facts that one can't find in this criminally stupid opinion. CAE-T5 we are all entitled to our own opinion but not to the kind of facts that can be found in this case. Input hopefully that will prevent him from making any further idiotic statements. CAE-T5 hopefully that will prevent him from making any more such statements. Input the poll is garbage. CAE-T5 the poll is fake news. Input it was irresponsible to publish this garbage. CAE-T5 it was irresponsible to publish this information. Input just because someone else has a different opinion does not mean that person is stupid, uneducated or foolish. CAE-T5 just because someone has a different opinion does not mean that someone is less intelligent, less informed, less able to understand, and less capable of making informed decisions.  @ @ @ @ @ @ @ position reversal, and @ @ @ @ @ h h h h h hallucination.
Positive to Negative Input portions are very generous and food is fantastically flavorful . DualRL portions are very thin and food is confusing . ST (Multi) portions are very poorly and food is springs flavorless . CAE-T5 portions are very small and food is awfully greasy for the price . Human portions are very small and food is not flavorful . Input staff : very cute and friendly . DualRL staff : very awful and rude . ST (Multi) staff : very nightmare and poor . CAE-T5 staff : very rude and pushy . Human staff : very ugly and mean . Input friendly and welcoming with a fun atmosphere and terrific food . DualRL rude and unprofessional with a loud atmosphere and awful food . ST (Multi) poor and fake with a fun atmosphere and mushy food .

CAE-T5
rude and unhelpful service with a forced smile and attitude . Human unfriendly and unwelcoming with a bad atmosphere and food .