Multi-granularity Textual Adversarial Attack with Behavior Cloning

Recently, the textual adversarial attack models become increasingly popular due to their successful in estimating the robustness of NLP models. However, existing works have obvious deficiencies. (1)They usually consider only a single granularity of modification strategies (e.g. word-level or sentence-level), which is insufficient to explore the holistic textual space for generation; (2) They need to query victim models hundreds of times to make a successful attack, which is highly inefficient in practice. To address such problems, in this paper we propose MAYA, a Multi-grAnularitY Attack model to effectively generate high-quality adversarial samples with fewer queries to victim models. Furthermore, we propose a reinforcement-learning based method to train a multi-granularity attack agent through behavior cloning with the expert knowledge from our MAYA algorithm to further reduce the query times. Additionally, we also adapt the agent to attack black-box models that only output labels without confidence scores. We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets. Experimental results show that our models achieve overall better attacking performance and produce more fluent and grammatical adversarial samples compared to baseline models. Besides, our adversarial attack agent significantly reduces the query times in both attack settings. Our codes are released at https://github.com/Yangyi-Chen/MAYA.


Introduction
Deep learning has been proven to be successful for many real-world applications such as spam filtering (Stringhini et al., 2010), autonomous driving * Work done during internship at CCIIP † Indicates equal contribution ‡ Corresponding author (Chen et al., 2017), and face recognition (Sun et al., 2015). However, these powerful models are vulnerable to adversarial samples, crafted by adding small, human-imperceptible perturbations to the input (Goodfellow et al., 2015;Szegedy et al., 2014).
In the domain of computer vision, numerous adversarial attack models have been proposed to benchmark and interpret black-box deep learning models (Dong et al., 2018;Moosavi-Dezfooli et al., 2016;Carlini and Wagner, 2017;Kurakin et al., 2017) and corresponding defense methods have also been proposed to tackle adversarial security issues (Dziugaite et al., 2016;Xie et al., 2018;Kurakin et al., 2017;Tramèr et al., 2018). However, crafting textual adversarial samples is more challenging due to the discrete and nondifferentiable nature of text space. Indeed, most existing works focus on a single granularity of modification strategies, such as sentence-level (Jia and Liang, 2017;Iyyer et al., 2018), word-level (Zang et al., 2020;Ren et al., 2019) or characterlevel (Eger et al., 2019). Thus, none of such attack models find the optimal solution through multi-granularities for launching attacks simultaneously, which is more efficient to generate high-performance and effective adversarial samples while preserving semantic consistency and language fluency. To this end, we propose a simple and novel attack model targeting on multiple kinds of granularity called MAYA, which achieves higher attack success rate with fewer queries to victim models and produces high-quality adversarial samples compared to baseline attack models. Specifically, we add perturbations to the original sentence via rewriting its constituents according to the strict grammatical constraints.
Besides, almost all current attack models need to query victim models hundreds or even thousands of times to launch a successful attack 1 and assume the victim models may output the confidence scores of their predictions, which is neither efficient nor practical in real-world situations. To alleviate such problems, we propose to train a multigranularity attack agent called MAYA π through behavior cloning (Torabi et al., 2018) with the expert knowledge from our MAYA algorithm.
We conduct exhaustive experiments including attacking three victim models over three benchmark datasets in two different black-box settings, namely score-based and decision-based attack, to evaluate the effectiveness of our attack models. While the former supposes the labels and the confidence scores of the victim models are available, the latter assumes only the label information can be accessed while the other is unknown, which is more challenging and rarely investigated.
Experimental results demonstrate the superiority of our attack models. Specifically, MAYA overall outperforms all baseline models in terms of attack success rate, attack efficiency, and quality of adversarial samples. MAYA π achieves comparable attack success rate and adversarial samples quality with baseline models while significantly reduces the query times in two black-box settings. Furthermore, we apply MAYA π to attack open-source NLP frameworks to demonstrate its practicality and effectiveness in practice.
To summarize, the main contributions of this paper are as follows: • Different from previous works that only concentrate on a single granularity, we propose an effective multi-granularity attack model to generate fluent and grammatical adversarial samples with fewer queries to victim models.
• We propose a RL-based method to train an agent through Behavior Cloning with the expert knowledge from our multi-granularity attack model and demonstrate its efficiency and power in two black-box settings, proving the effectiveness of our adapted imitation algorithm.
• We successfully handle the issues of decisionbased black-box attack, which is rarely investigated in NLP.

Related Work
Existing textual adversarial attack models can be roughly categorized according to the granularity of modification, e.g., character-level, word-level, sentence-level.
Sentence-level attack models often contain paraphrasing original sentences following pre-defined syntax patterns (Iyyer et al., 2018), adding an irrelevant sentence to the end of the passage to distract models (Jia and Liang, 2017), and conducting domain shift on original sentences (Wang et al., 2020). However, sentence-level attacks usually neglect fine-grained granularity, such as word-level, resulting in low attack success rate.
Word-level attack is relatively more investigated and can be modeled as a combinatorial optimization problem (Zang et al., 2020), including finding substitution words and searching for adversarial samples. The methods of finding candidate substitutes mainly focus on the similarity of word embeddings (Jin et al., 2019), WordNet synonyms (Ren et al., 2019), HowNet synonyms (Zang et al., 2020), and Masked Language Model (MLM) (Li et al., 2020). Generally, the search algorithms involve greedy search algorithm (Ren et al., 2019;Liang et al., 2018;Jin et al., 2019), genetic algorithm (Alzantot et al., 2018), and particle swarm optimization (Zang et al., 2020). Although these attack models can achieve relatively high attacking performance, considering only a single granularity restricts the upper bound of word-level attack models' performance and almost all these models need to query victim models hundreds of times to launch a successful attack.
Character-level attacks make different modifications to words such as swapping, deleting, and inserting characters (Ebrahimi et al., 2018;Belinkov and Bisk, 2018;Gao et al., 2018). These attack models often craft ungrammatical adversarial samples and can be easily defended (Pruthi et al., 2019;Jones et al., 2020). Hence, in this work, we do not incorporate character-level modification into our multi-granularity framework.
To sum up, all above models only consider a single granularity and thus are insufficient in exploring the textual space for generation. So, we propose to launch attacks on multiple granularities in this paper. Experimental results demonstrate the effectiveness and efficiency of our method.

Methodology
In this section, we first describe our multigranularity attack (MAYA) model in detail. Then we introduce how to train an attack agent, denoted as MAYA π , with the knowledge from our MAYA algorithm. Finally, we describe how we adapt MAYA π to perform decision-based black-box attack.

Multi-granularity Adversarial Attack
Our MAYA model incorporates three parts, namely generating adversarial candidates (Generate), verifying the successful attack (Verify), and picking the most potential candidate if no successful attack found (Pick). The whole process is shown as pseudocode in Appendix A.
Generate Given the input sentence S = [w 0 , ..., w i , ..., w n ], we first conduct constituency parsing on the original sentence using SuPar  to obtain its constituents. Then we generate adversarial candidates from two different perspectives.
First, for each constituent (including the whole sentence), i.e., each granularity of modification, except word-level, we employ various paraphrase models to generate adversarial samples via rewriting the specified constituents while keep the reset unchanged. However, such setting is solely a local modification which may cause syntactic inconsistency of the whole sentence, and thus we adopt the following rules to make the process more rational: • The number of grammatical mistakes of the generated adversarial candidates must be less than or equal to the one of the original sentence, which can be checked by Language-Tool 2 . • The chosen adversarial candidate should be the one that is the most similar to the original one, i.e., preserving most of semantic information of the given sentence as much as possible. Specifically, Sentence-BERT (Reimers and Gurevych, 2019) is adopted for encoding the sentence and its candidates and we consider to employ a similarity function (e.g., cosine) to measure the semantic perseverance. The filtered candidates are collected into a set (denoted as V p ). Next, for word-level perturbation, we mask words in the original sentence one by one to generate corresponding adversarial candidates. Specifically, for w i , we generate adversarial candidate S w i = [w 0 , ..., [M ASK], ..., w n ]. We collect all adversarial candidates generated in this way into a set (denoted as V s ).
Verify Given all adversarial candidates, we query the victim model for decisions and confidence scores. If there doesn't exist an adversarial candidate successfully fools the victim model, we enter into the Pick step, which we will discuss later. If one or more than one successful adversarial candidates found, there are three different cases that we address differently. First, if all successful candidates come from V p , we choose the one that retains the most semantics measured by cosine similarity of sentence embeddings as the final adversarial sample. Second, if successful candidates come from both V p and V s , we only choose the candidates from V p following the same rule in the first case. The reason we ignore candidates from V s is that we need to fill the [M ASK] token with substitutes and continually query the victim model for decisions, which is inefficient in the case we already have successful candidates from V p . Finally, if all successful candidates come from V s , we need to fill the [M ASK] token with substitutes to verify their success. Due to the same workflow in the Pick step, we directly view each successful candidate as S and move to the second case in the Pick step.
Pick If no successful candidate found, we need to pick the most potential candidate as the new sentence and repeat the same Generate and Verify procedures to find the adversarial sample. Our criterion is the decrease of the victim model's confidence score. Here we denote the candidate that causes the biggest drop in the victim model's confidence score as S . There are also two different cases. First, when S comes from V p , we directly choose S as the most potential candidate and return to the Generate step. Second, when S comes from V s , we need to fill the [M ASK] token with substitutes to construct a complete sentence. Follow Li et al. (2020), we use MLM (Devlin et al., 2019) to generate k substitutes for the [M ASK] position in S 3 and utilize WordNet (Fellbaum, 1998) to filter out antonyms of original words. Then we iteratively substitute [M ASK] token with candidates in probability descending order computed by MLM and query the victim model for confidence scores. If one substitution successfully fools the victim model, we return the whole sentence as the final adversarial sample. Otherwise, we obtain the sentence, denoted as S w , that causes the biggest drop in the victim model's confidence score. We compare S w with all candidates from V p , choose  the one that causes the biggest drop in the confidence score as the most potential candidate, and return to the Generate step.

Combined with Behavior Cloning
As seen in Figure 1, we use BERT base (Devlin et al., 2019) and a linear classifier with one output unit as the architecture of MAYA π . The core function of MAYA π is to predict the most potential candidate without querying the victim model. In this section, We first describe how we exploit MAYA π to launch an adversarial attack because we require MAYA π to perform the full procedure of attacking in the training process. And then we detail the training process.

Launch an Adversarial Attack
Now assume that we have already trained an attack agent MAYA π . Given the input sentence S, we follow the same procedure as in the Generate step in MAYA algorithm to generate adversarial candidate sets V p and V s , corresponding to two different generation processes. Then, with the original sentence S and an adversarial candidate S i concatenated as the input, MAYA π will output a score as a measure of tendency to choose this specific adversarial candidate. We obtain the candidate S that get the highest score. Similarly, there are two different cases.
First, when S comes from V p , we directly use S to query the victim model. If it successfully fools the victim model, we return S as the final adversarial sample. Otherwise, we view S as the most potential candidate and return to the Generate step.
Second, when S comes from V s , we follow the same procedure as in the Pick step in MAYA algorithm that iteratively substitutes [M ASK] token with candidate words and query the victim model for confidence scores. If one successful candidate found, we directly return this sentence as the final adversarial sample. Otherwise, we view the candidate that causes the biggest drop in the confidence score as the most potential candidate and return to the Generate step. The whole process will be repeated until a successful adversarial sample found or all potential candidates have been encountered before. In the next subsection, we will describe how we adapt this score-based Pick step to launch a decision-based black-box attack.

Training Process
In this subsection, we describe our RL-based method to train MAYA π through Behavior Cloning with the expert knowledge from our MAYA algorithm. Specifically, we improve the training process by adapting the Dataset Aggregation (DAGGER) method (Ross et al., 2011). The training process incorporates three parts, namely initialization, sampling trajectories, and training.
Initialization We initialize MAYA π with pretrained weights from BERT (Devlin et al., 2019) and a random initialized MLP. Also, we initialize an empty trajectory dataset D.
Sampling Trajectories To train MAYA π , we need to interact with the victim model to obtain the training data. Specifically, we train a local victim model that has the same architecture with the target victim model, expecting to approximate the decision boundary of the target victim model 4 .
We sample a batch of original sentences. For each sentence S 0 , we generate adversarial candidates C 1 , ..., C k . As in the Verify and Pick steps in MAYA algorithm, one specific candidate will be chosen as the final successful adversarial sample or the most potential candidate. We view the candidate chosen by MAYA algorithm as the ground truth label and add ((S 0 , C 1 , ..., C k ), label) to our trajectory dataset D.
To fully train an agent that can tackle different situations, we need a large dataset D. So, we adapt DAGGER method. Specifically, when receiving the ground truth label from MAYA algorithm, MAYA π doesn't take the golden action indicated by MAYA. It will take the action based on its own prediction. That is, MAYA π will predict which candidate will most confuse the victim model and follow its own procedure of launching an adversarial attack. The predicted candidate will be treated as S 0 and we will continue the same process of sampling trajectories to augment the dataset D.
Training Then we train MAYA π for only one epoch using the trajectory dataset D. We model the training task as a multi-class classification problem. For each sample ((S 0 , C 1 , ..., C k ), label) drawn from D, we concatenate S 0 with each C i . Then we input the k concatenated sentences to MAYA π to get k scores. We treat these k scores as logits and use the cross-entropy loss to train MAYA π . After the training process, we clear D and continue the sampling procedure. The implementation details are described in Appendix I.

Adapted to Decision-based Attack
To adapt MAYA π to decision-based attack, we only need to modify one step in the attack procedure described in Section 3.2.1 while keep other steps unchanged. Specifically, when the candidate S that gets the highest score from MAYA π is from V s , we iteratively substitute [M ASK] token with candidate words to generate adversarial candidates and query the victim model for decisions. If one candidate successfully flips the label, we treat it as the final adversarial sample. Otherwise, to generate adversarial samples more efficiently, we take the candidate whose sentence embedding has the lowest cosine similarity with the sentence embedding of the original sentence as the most potential candidate. Our intuition is that the candidate that least resembles the original sentences is more likely to be a successful adversarial sample.

Experiments
We conduct comprehensive experiments to evaluate our attack models on the tasks of sentiment analysis, natural language inference, and news classification.

Datasets and Victim Models
For sentiment analysis, we choose SST-2 (Socher et al., 2013), a binary sentiment classification benchmark dataset. For natural language inference, we choose mismatched MNLI dataset (Williams et al., 2018). For news classification, we choose AG's News dataset. The models need to choose one of the four classes including World, Sports, Business, and Sci/Tech, given an instance in AG's News (Zhang et al., 2015).
We evaluate our attack models by attacking three victim models including BiLSTM (Schuster and Paliwal, 1997), BERT (Devlin et al., 2019), and RoBERTa (Liu et al., 2019). Details of the datasets and the classification accuracy of victim models are listed in Table 1.

Attack Models
We implement all baseline attack models using the NLP attack package TextAttack (Morris et al., 2020) and OpenAttack (Zeng et al., 2021).

Score-based Attack Models
We comprehensively compare our score-based attack models with five representative and strong score-based attack models including (1) GA+Embedding (Alzantot et al., 2018),  MAYA bt We observe from our preliminary experiments that only using back translation model can achieve comparable performance in most of the cases and be more computation efficient. So, we also implement MAYA using only back translation model, denoted as MAYA bt .
MAYA π Due to the similar performance of MAYA and MAYA bt most of the time, we train our attack agent through behavior cloning with the expert knowledge from MAYA bt in consideration of the efficiency of training and launching an adversarial attack.

Decision-based Attack Models
We consider two decision-based baseline models including (1) GAHard (Maheshwary et al., 2020) and (2) SCPN (Iyyer et al., 2018). Details of baseline models are listed in Appendix B. We conduct exhaustive experiments to compare our decisionbased MAYA * π with existing decision-based attack models.

Experimental Settings
Hyper-parameters For our attack models, we set the number of word substitutes k to 10. And for MAYA, to ensure the quality of successful adversarial samples, we discard adversarial samples with modification number larger than 8, 8, and 12 in SST-2, MNLI, and AG's News respectively due to the difference of average sentence length in three datasets. Besides, we also set a maximum query number restriction to 15,000 for all attack models in the decision-based black-box attack setting due to the computation and time budget.

Evaluation Metrics
We evaluate the attack models considering their attack success rate, attack efficiency, and the quality of adversarial samples. (1) Attack success rate is defined as the percentage of adversarial samples that successfully fool the victim model. (2) Attack efficiency is defined as the average query number to the victim model of crafting an adversarial sample. (3) We use four different metrics, including grammaticality, fluency, validity, and naturality to evaluate adversarial samples' quality. Specifically, we use Language-Tool to calculate the relative increase rate of grammar errors, GPT-2 (Radford et al., 2019) to compute adversarial samples' perplexity as a measure of fluency, and ask human annotators to evaluate adversarial samples' validity and naturality.

Experimental Results
Attack Success Rate The attack success rate (ASR) results in score-based attack setting are listed in Table 2 and the results in decision-based setting are listed in Appendix C. Considering scorebased attack, MAYA consistently outperforms all baseline models in three datasets and three victim models and MAYA π achieves comparable attack success rate with baseline models. In decisionbased attack setting, MAYA * π overall outperforms baseline models, especially in AG's News because the sentences in AG's News are much longer, providing more constituents to perturb. The results demonstrate the advantage of our multi-granularity attack models.
Attack Efficiency For score-based attack, our models especially MAYA π show great superiority over all baseline models. For decision-based attack, MAYA * π significantly outperforms GAHard, which needs thousands of queries.
Furthermore, we measure the attack success rate of attack models under the restriction of maximum query number. Figure 2 show all attack models' attack success rate in SST-2 when attack-   ing BERT under the restriction of maximum query number. Appendix D shows remainder results in three datasets and three victim models. We observe that our RL-based attack models significantly outperform all baseline models under the restriction of maximum query number, demonstrating the practicality of our attack models in real-world situations. To confirm our argument, we present results of attacking two open-source NLP frameworks in Appendix F.
Adversarial Sample Quality We can observe from the results that our multi-granularity attack models overall outperform all baseline models considering adversarial samples' fluency and relative increase of grammar errors. Human evaluation results presented in Appendix G also prove the high quality of our adversarial samples.

Constituent Selection
It's important to investigate which constituent type our multi-granularity attack models tend to select as the vulnerable part of the sentence and the impact of different constituent types. We first investigate the selection frequency of all constituent types. The results are listed in Appendix E. Then, we select 7 constituent types that are more common and  often selected as the vulnerable parts of sentences. We restrict the selection of MAYA to each of these constituent types and evaluate the attacking performance. We also list the results of attacking with restriction to only words (BERT-Attack), with restriction to only phrases (Phrases), and with no restriction (All) for comparison 7 . We can conclude from Table 3 that while wordlevel substitution (BERT-Attack) ensures the attacking performance, there still exists a significant gap between word-level attack and our multigranularity attack (All). Besides, paraphrasing constituent types improves the quality of sentences due to our strict restrictions and can produce adversarial samples with some probability.

Transferability
We investigate the transferability of adversarial samples produced by all attack models in SST-2 with BERT as the victim model. We don't consider SCPN in this transferability study because this method is model agnostic and cannot be directly compared with other attack models. We can observe from Table 4 that our MAYA π attack agent crafts adversarial samples with significantly higher transferability. That's probably because MAYAπ perturbs the sentence based not only on the outputs of victim models but also on its own prediction, ensuring adversarial samples to capture common vulnerabilities of different victim models.

Impact of Imitation Algorithm
Despite the strong attacking performance of MAYA π , the impact of our adapted imitation algorithm is unknown. One may attribute the suc- 7 We refer readers to (Taylor et al., 2003) for the meaning of syntax tags.   cess to the capacity of the multi-granularity attack model, ignoring the contribution of the imitation learning process. So, we investigate the impact of our adapted imitation algorithm in this section. We employ a random initialized MAYA π without interacting with local victim models to launch attacks against BERT in SST-2. From Table 5, we can conclude that the imitation learning process do bring some useful knowledge to our attack agent.

Limitation
We make a strong assumption in the development of our imitation algorithm that we assume that we have already known the victim models' architectures, which is unrealistic in real-world situations. However, as mentioned in Section 4.4, our attack agent can successfully launch adversarial attacks against real-world NLP frameworks, which confirms the practicability of our imitation algorithm. The results are presented in Appendix F. Further, we investigate the attacking performance of our attack agents in SST-2 dataset when the victim model's architecture is unknown. Specifically, we employ the attack agent trained by interacting with our local BERT model, denoted as MAYA B π to launch adversarial attacks against BiLSTM and RoBERTa. We compare the attacking performance with that of originally trained attack agent, denoted as MAYA o π . We can observe from Table 6 that MAYA B π achieves similar attacking performance while maintain the attack efficiency and quality of adversarial samples, especially in RoBERTa. That's probably because the common features and architectures shared by pre-trained models, which strongly supports our view that our attack agents can indeed cause significant drop in models' prediction accuracy even though we the victim models' architectures are unknown because we can just assume the black-box system is built on pre-trained models, which can be verified in most cases.

Conclusion and Future Work
In this paper, we propose a multi-granularity adversarial attack model (MAYA) and propose a RLbased method to train an attack agent (MAYA π ) through behavior cloning with the expert knowledge from our MAYA algorithm. Further, we adapt MAYA π to decision-based attack setting, handling the issues of attacking models that only output decisions. Experimental results show that our attack models achieve overall higher attacking performance and produce more fluent and grammatical adversarial samples. We also show that MAYA π can launch adversarial attacks towards open-source NLP frameworks, demonstrating the practicability of our attack agent in real-world situations.
In the future, we will focus on how to improve current models' robustness towards multigranularity attacks. In addition, we will try to apply MAYA π to other less investigated settings in textual adversarial attack, such as to launch target attacks in decision-based attack setting.

Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61602197. Also, we thank all the anonymous reviewers for their valuable comments and suggestions.

Ethical Considerations
In this section, we discuss the potential broader impact and ethical considerations of our paper.
Intended use. In this paper, we propose multigranularity attacking models that can handle different attack settings with superior performance. Our motivations are twofold. First, we can find some insights from the experimental results about current black-box machine learning models that can help us move towards explainable AI. Second, we demonstrate the potential risks of deploying current models in the real world, encouraging the research community to develop more robust models.
Potential risk. It's possible that our attacking models may be maliciously used to launch an adversarial attack against off-the-shelf commercial systems. However, according to the research on adversarial attack on computer vision, it's important to make the research community realize these powerful attacking models before defending them. So, studying and investigating adversarial attack is significant.
Energy saving. We present the details of our training process in Appendix J to prevent people from making unnecessary hyper-parameters tuning and help researchers to quickly reproduce our results. We will also release the checkpoints including all victim models and our attack agents to avoid energy costs to re-train them. Zhendong Dong, Qiang Dong, and Changling Hao. 2010. HowNet and its computation of meaning. In

D Attack Efficiency
In this section, we show the results of attack efficiency in three datasets and three victim models. Figure 3 and Figure 4 show results of attacking BiLSTM and RoBERTa in SST-2. Figure 5-7 show results of attacking BiLSTM, BERT, and RoBERTa in MNLI. Figure 8-10 show results of attacking BiLSTM, BERT, and RoBERTa in AG's News. for each constituent part c in C do compare and return the most potential candidate S a = Choose(L s , s p , scores, w scores) trained by interacting with BERT trained on SST-2 dataset. Notice that we conduct score-based adversarial attack on AllenNLP sentiment analysis model and decision-based attack on Stanza model according to the outputs of the victim models. Table 9 show the results. We observe that our two attack agents can function well in real-world situations in two different attack settings and produce high-quality adversarial samples, showing the potential vulnerability of current NLP systems.

G Human Evaluation
We set up human evaluation to further evaluate the quality of our adversarial samples. Follow Zang et al. (2020), we consider 2 evaluation metrics including validity and naturality. Due to the large number of baseline models, we directly compare our crafted adversarial samples with original samples to evaluate the quality of our adversarial samples. We randomly sample 100 original sentences from SST-2 dataset and 100 adversarial samples crafted by MAYA in SST-2 dataset and mix them. For each sentence, we ask 3 human annotators to do normal sentiment classification task and score this sentence's naturality from 1-5. We use the voting strategy to produce the annotation results of validity for each adversarial sample. Specifically, we respectively measure the human annotators' accuracy on original and adversarial samples and view the difference of accuracy as an indicator of adversarial samples' validity. And we average 3 annotators' naturality scores to get the final results.

Sample I Original Sentence
The second-ranked Jayhawks can redeem themselves for one of their most frustrating losses last season Monday when they welcome the Wolf Pack to Allen Fieldhouse . Adversarial Sentence The second-ranked Jayhawks can redeem themselves for one of their most frustrating losses last season Monday when the Wolf Pack is welcomed to the semifinals .
Sample II Original Sentence If the playoffs opened right now, instead of next month, the A #39 ; s would face the Red Sox in the first round -again. Boston bounced Oakland out of the postseason in five games last year, coming back from a 2-0 deficit to do so . Adversarial Sentence If the playoffs opened right now, instead of next month, the A #39 . s would face the Red Sox in the first round -once again . Boston bounced Oakland out of the postseason in five games last year , coming back from a deficit .     Table 10, the close gap of accuracy between original and adversarial samples indicates that our adversarial samples maintain high validity. Besides, our adversarial samples also achieve high naturality, which is consistent with automatic evaluation metrics in our main experiments.

H Case Study
We select 2 successful adversarial samples crafted by our multi-granularity attack models in AG's News. Note that other baseline attack models all fail in these two samples. We can observe from Table 7 the strength of our models is twofold. First, from Sample I, our attack models can take different kinds of granularity into consideration, making a bigger search space and crafting more diversified adversarial samples. Second, from Sample II, our attack models can combine different kinds of granularity perturbations to launch a stronger adversarial attack.

I Implementation Details
For our reinforcement learning, we use standard Adam (Kingma and Ba, 2015) to train our agent and consistently set the learning rate to 2e-5 because our training process is based on data aggregation, meaning that the training data can be abundant. And we set batch size to 16.
Due to the limitation of GPU memory and computation resources, we use some tricks to get average batch gradients. Given a batch of original sentences list [S 1 , ..., S i , ..., S n ], we input the sentence one by one to our attack agent with all adversarial candidates of original sentences S i and compute the cross-entropy loss l i with the golden label from MAYA algorithm. Here, we denote the number of adversarial candidates as k i and we get weighted loss L i by multiplying l i with k i : Then we directly perform back propagation to get the gradients for each parameters: We save the gradients and repeat above operations to accumulate the gradients. Finally, we have: When reach the batch size, we normalize the gradients and update the parameters.