Certified Robustness to Programmable Transformations in LSTMs

Deep neural networks for natural language processing are fragile in the face of adversarial examples—small input perturbations, like synonym substitution or word duplication, which cause a neural network to change its prediction. We present an approach to certifying the robustness of LSTMs (and extensions of LSTMs) and training models that can be efficiently certified. Our approach can certify robustness to intractably large perturbation spaces defined programmatically in a language of string transformations. Our evaluation shows that (1) our approach can train models that are more robust to combinations of string transformations than those produced using existing techniques; (2) our approach can show high certification accuracy of the resulting models.


Introduction
Adversarial examples are small perturbations of an input that fool a neural network into changing its prediction (Carlini and Wagner, 2017;Szegedy et al., 2014). In NLP, adversarial examples involve modifying an input string by, for example, replacing words with synonyms, deleting stop words, inserting words, etc. (Ebrahimi et al., 2018;Zhang et al., 2019).
Ideally, a defense against adversarial examples in NLP tasks should fulfill the following desiderata: (1) Handle recursive models, like LSTMs and extensions thereof, which are prevalent in NLP. (2) Construct certificates (proofs) of robustness. (3) Defend against arbitrary string transformations, like combinations of word deletion, insertion, etc.
It is quite challenging to fulfill all three desiderata; indeed, existing techniques are forced to make * Author's name in native alphabet: tradeoffs. For instance, the theoretical insights underlying a number of certification approaches are intimately tied to symbol substitution (Jia et al., 2019;Huang et al., 2019;Ye et al., 2020;Xu et al., 2020;Dong et al., 2021), and some techniques cannot handle recursive models (Huang et al., 2019;. On the other hand, techniques that strive to be robust to arbitrary string transformations achieve this at the expense of certification Ebrahimi et al., 2018).
In this paper, we ask: Can we develop a certified defense to arbitrary string transformations that applies to recursive neural networks?
Our approach. Certifying robustness involves proving that a network's prediction is the same no matter how a given input string is perturbed. We assume that the perturbation space is defined as a program describing a set of possible string transformations )-e.g., if you see the word "movie", replace it with "film" or "movies". Such transformations can succinctly define a perturbation space that is exponentially large in the length of the input; so, certification by enumerating the perturbation space is generally impractical.
We present ARC (Abstract Recursive Certification), an approach for certifying robustness to programmatically defined perturbation spaces. ARC can be used within an adversarial training loop to train robust models. We illustrate the key ideas behind ARC through a simple example. Consider the (partial) input sentence to the movie..., and say we are using an LSTM for prediction. Say we have two string transformations: (T1) If you see the word movie, you can replace it with film or movies. (T2) If you see the word the or to, you can delete it. ARC avoids enumerating the large perturbation space ( Fig. 1(a)) using two key insights.
Memoization: ARC exploits the recursive structure of LSTM networks, and their extensions (BiL-STMs, TreeLSTMs), to avoid recomputing intermediate hidden states. ARC memoizes hidden states of prefixes shared across multiple sequences in the perturbation space. For example, the two sentences to the movie... and to the film.... share the prefix to the, and therefore we memoize the hidden state after the word the, as illustrated in Fig. 1(b) with dashed blue lines. The critical challenge is characterizing which strings share prefixes without having to explicitly explore the perturbation space.
Abstraction: ARC uses abstract interpretation (Cousot and Cousot, 1977) to symbolically represent sets of perturbed strings, avoiding a combinatorial explosion. Specifically, ARC represents a set of strings as a hyperrectangle in a R n and propagates the hyperrectangle through the network using interval arithmetic .This idea is illustrated in Fig. 1(b), where the words film and movies are represented as a hyperrectangle. By joining hidden states of different sentences (a common idea in program analysis), ARC can perform certification efficiently.
Memoization and abstraction enable ARC to efficiently certify robustness to very large perturbation spaces. Note that ARC subsumes Xu et al. (2020) because ARC can certify arbitrary string transformations, while Xu et al. (2020) only works on word substitutions.
Contributions. We make the following contributions: (1) We present ARC, an approach for training certifiably robust recursive neural networks. We demonstrate our approach on LSTMs, BiLSTMs, and TreeLSTMs. (2) We present a novel application of abstract interpretation to symbolically capture a large space of strings, defined programmatically, and propagate it through a recursive network  Ko et al. (2019a) lp norm Huang et al. (2019) substitution Jia et al. (2019) substitution  arbitrary Ye et al. (2020) P substitution Xu et al. (2020) substitution Dong et al. (2021) substitution ARC (this paper) arbitrary (Section 4).
(3) Our evaluation shows that ARC can train models that are more robust to arbitrary perturbation spaces than those produced by existing techniques; ARC can show high certification accuracy of the resulting models; and ARC can certify robustness to attacks (transformations) that are out-of-scope for existing techniques (Section 5). Table 1 compares ARC to most related approaches.
Certification of robustness in NLP. See Li et al. (2020) for a survey of robust training. Some works focus on certifying the l p norm ball of each word embedding for LSTMs (Ko et al., 2019b;Jacoby et al., 2020) and transformers . Others focus on certifying word substitutions for CNNs (Huang et al., 2019) and LSTMs (Jia et al., 2019;Xu et al., 2020), and word deletions for the decomposable attention model (Welbl et al., 2020). Existing techniques rely on abstract interpretation, such as IBP  and CROWN (Zhang et al., 2018). We focus on certifying the robustness of LSTM models (including TreeLSTMs) to a programmable perturbation space, which is out-of-scope for existing techniques. Note that Xu et al. (2020) also uses memoization and abstraction to certify, but ARC subsumes Xu et al. (2020) because ARC can certify arbitrary string transformations. We use IBP, but our approach can use other abstract domains, such as zonotopes . SAFER (Ye et al., 2020) is a model-agnostic approach that uses randomized smoothing (Cohen et al., 2019) to give probabilistic certificates of robustness to word substitution. Our approach gives a non-probabilistic certificate and can handle arbitrary perturbation spaces beyond substitution.
Robustness techniques in NLP. Adversarial train-ing is an empirical defense method that can improve the robustness of models by solving a robust-optimization problem (Madry et al., 2018), which minimizes worst-case (adversarial) loss. Some techniques in NLP use adversarial attacks to compute a lower bound on the worst-case loss (Ebrahimi et al., 2018;Michel et al., 2019). ASCC (Dong et al., 2021) overapproximates the word substitution attack space by a convex hull where a lower bound on the worst-case loss is computed using gradients. Other techniques compute upper bounds on adversarial loss using abstract interpretation . Huang et al. (2019) and Jia et al. (2019); Xu et al. (2020) used abstract interpretation to train CNN and LSTM models against word substitutions. A3T  trains robust CNN models against a programmable perturbation space by combining adversarial training and abstraction. Our approach uses abstract interpretation to train robust LSTMs against programmable perturbation spaces as defined in .

Robustness Problem and Preliminaries
We consider a classification setting with a neural network F θ with parameters θ, trained on samples from domain X and labels from Y. The domain X is a set of strings over a finite set of symbols Σ (e.g., English words or characters), i.e., X = Σ * . We use x ∈ Σ * to denote a string; x i ∈ Σ to denote the ith element of the string; x i:j to denote the substring x i , . . . , x j ; to denote the empty string; and LEN x to denote the length of the string.
Robustness to string transformations. A perturbation space S is a function in Σ * → 2 Σ * , i.e., S takes a string x and returns a set of possible perturbed strings obtained by modifying x. Intuitively, S(x) denotes a set of strings that are semantically similar to x and therefore should receive the same prediction. We assume x ∈ S(x).
Given string x with label y and a perturbation space S, We say that a neural network F θ is Srobust on (x, y) iff Our primary goal in this paper is to certify, or prove, S-robustness (Eq 1) of the neural network for a pair (x, y). Given a certification approach, we can then use it within an adversarial training loop to yield certifiably robust networks.
Robustness certification. We will certify Srobustness by solving an adversarial loss objective: where we assume that the loss function L θ is < 0 when F θ (z) = y and 0 when F θ (z) = y. Therefore, if we can show that the solution to the above problem is < 0, then we have a certificate of S-robustness.
Certified training. If we have a procedure to compute adversarial loss, we can use it for adversarial training by solving the following robust optimization objective (Madry et al., 2018), where D is the data distribution:

Programmable Perturbation Spaces
In our problem definition, we assumed an arbitrary perturbation space S. We adopt the recently proposed specification language  to define S programmatically as a set of string transformations. The language is very flexible, allowing the definition of a rich class of transformations as match and replace functions.

Single transformations. A string transformation
T is a pair (ϕ, f ), where ϕ : Σ s → {0, 1} is the match function, a Boolean function that specifies the substrings (of length s) to which the transformation can be applied; and f : Σ s → 2 Σ t is the replace function, which specifies how the substrings matched by ϕ can be replaced (with strings of length t). We will call s and t the size of the domain and range of transformation T , respectively.
Example 3.1. In all examples, the set of symbols Σ is English words. So, strings are English sentences. Let T del be a string transformation that deletes the stop words "to" and "the". Formally, Let T sub be a transformation substituting the word "movie" with "movies" or "film". Formally, T sub = (ϕ sub , f sub ), where ϕ sub : Σ 1 → {0, 1} and f sub : Σ 1 → 2 Σ 1 are: ϕsub(x) = 1, x = "movie" 0, otherwise , fsub(x) = "film", "movies" Defining perturbation spaces. We can compose different string transformation to construct perturbation space S: where each T i denotes a string transformation that can be applied up to δ i ∈ N times. Note that the transformations can be applied whenever they match a non-overlapping set of substrings and are then transformed in parallel. We illustrate with an example and refer to  for formal semantics.
Example 3.2. Let S = {(T del , 1), (T sub , 1)} be a perturbation space that applies T del and T sub to the given input sequence up to once each. If x ="to the movie", a subset of the perturbation space S(x) is shown in Fig. 1(a).
Decomposition. S = {(T i , δ i )} i can be decomposed into (δ i + 1) subset perturbation spaces by considering all smaller combinations of δ i . We denote the decomposition of perturbation space S as DEC S , and exemplify below: where ∅ is the perturbation space with no transformations, i.e., if S = ∅, then S(x) = {x} for any string x.
We use notation S k↓ to denote S after reducing δ k by 1; therefore, S k↓ ∈ DEC S .

The LSTM Cell
We focus our exposition on LSTMs. An LSTM cell is a function, denoted LSTM, that takes as input a symbol x i and the previous hidden state and cell state, and outputs the hidden state and the cell state at the current time step. For simplicity, we use h i to denote the concatenation of the hidden and cell states at the ith time step, and simply refer to it as the state. Given string x, we define h i as follows: where h 0 = 0 d and d is the dimensionality of the state. For a string x of length n, we say that h n is the final state.

ARC: Abstract Recursive Certification
In this section, we present our technique for proving S-robustness of an LSTM on (x, y). Formally, we do this by computing the adversarial loss, max z∈S(x) L θ (z, y). Recall that if the solution is < 0, then we have proven S-robustness. To solve adversarial loss optimally, we effectively need to evaluate the LSTM on all of S(x) and collect all final states: Computing F precisely is challenging, as S(x) may be prohibitively large. Therefore, we propose to compute a superset of F , which we will call F . This superset will therefore yield an upper bound on the adversarial loss. We prove S-robustness if the upper bound is < 0.
To compute F , we present two key ideas that go hand-in-hand: In Section 4.1, we observe that strings in the perturbation space share common prefixes, and therefore we can memoize hidden states to reduce the number of evaluations of LSTM cells-a form of dynamic programming. We carefully derive the set of final states F as a system of memoizing equations. The challenge of this derivation is characterizing which strings share common prefixes without explicitly exploring the perturbation space. In Section 4.2, we apply abstract interpretation to efficiently and soundly solve the system of memoizing equations, thus computing an overapproximation F ⊇ F .

Memoizing Equations of Final States
Tight Perturbation Space. Given a perturbation space S, we shall use S = to denote the tight perturbation space where each transformation T j in S is be applied exactly δ j times.
Think of the set of all strings in a perturbation space as a tree, like in Fig. 1(b), where strings that share prefixes share LSTM states. We want to characterize a subset H S i,j of LSTM states at the ith layer where the perturbed prefixes have had all transformations in a space S applied on the original prefix x 1:j . ori: pert: We formally define H S i,j as follows: These states result from deleting the first and second words of the prefix "to the", respectively. We Memoizing equation. We now demonstrate how to rewrite Eq 6 by explicitly applying the transformations defining the perturbation space S. Notice that each H S i,j comes from two sets of strings: (1) strings whose suffix (the last character) is not perturbed by any transformations (the first line of Eq 8), and (2) strings whose suffix is perturbed by T k = (ϕ k , f k ) (the second line of Eq 8), as illustrated in Fig 2. Thus, we derive the final equation and then immediately show an example: where a=j−s k + 1 and b=j.
We compute Eq 8 in a bottom-up fashion, starting from H ∅ 0,0 = {0 d } and increasing i, j and considering every possible perturbation space in the decomposition of S, DEC S .  . We demonstrate how to derive states from Eq 8: The first line of Eq 9 evaluates to {LSTM("the", h 0 )}, which corresponds to deleting the first word of the prefix "to the". Because z can only be an empty string, the second line of Eq 9 evaluates to {LSTM("to", h 0 )}, which corresponds to deleting the second word of "to the". The dashed green line in Fig. 1(b) shows the computation of Eq 9.
Defining Final States using Prefixes. Finally, we compute the set of final states, F , by considering all perturbation spaces in the decomposition of S.

Abstract Memoizing Equations
Memoization avoids recomputing hidden states, but it still incurs a combinatorial explosion. We employ abstract interpretation (Cousot and Cousot, 1977) to solve the equations efficiently by overapproximating the set F . See Albarghouthi (2021) for details on abstractly interpreting neural networks.
Abstract Interpretation. The interval domain, or interval bound propagation, allows us to evaluate a function on an infinite set of inputs represented as a hyperrectangle in R n .
Interval domain. We define the interval domain over scalars-the extension to vectors is standard. We will use an interval [l, u] ⊂ R, where l, u ∈ R and l u, to denote the set of all real numbers between l and u, inclusive.
For a finite set X ⊂ R, the abstraction operator gives the tightest interval containing X, as follows: α(X) = [min(X), max(X)]. Abstraction allows us to compactly represent a large set of strings.
The join operation, , produces the smallest interval containing two intervals: [l, u] [l , u ] = [min(l, l ), max(u, u )]. We will use joins to merge hidden states resulting from different strings in the perturbation space (recall Fig. 1(b)). Interval transformers. To evaluate a neural network on intervals, we lift neural-network operations to interval arithmetic-abstract transformers. For a function g, we use g to denote its abstract transformer. We use the transformers proposed by ; Jia et al. (2019). We illustrate transformers for addition and any monotonically increasing function g : R → R (e.g., ReLU, tanh).
Note how, for monotonic functions g, the abstract transformer g simply applies g to the lower and upper bounds. An abstract transformer g must be sound: for any interval [l, u] and x ∈ [l, u], we have g(x) ∈ g([l, u]). We use LSTM to denote an abstract transformer of an LSTM cell. It takes an interval of symbol embeddings and an interval of states. We use the definition of LSTM given by Jia et al. (2019).
Abstract Memoizing Equations. We now show how to solve Eq 8 and Eq 10 using abstract interpretation. We do this by rewriting the equations using operations over intervals. Let where a and b are the same in Eq 8. The two key ideas are (1) representing sets of possible LSTM inputs abstractly as intervals, using α; and (2) joining intervals of states, using . These two ideas ensure that we efficiently solve the system of equations, producing an overapproximation F .
The above abstract equations give us a compact overapproximation of F that can be computed with a number of steps that is linear in the length of the input. Even though we can have O(LEN 2 x ) number of H S i,j for a given S, only O(LEN x ) number of H S i,j are non-empty. This property is used in Theorem 4.2 and will be proved in the appendix.
For practical perturbations spaces (see Section 5), the quantity n n i=1 δ i is typically small and can be considered constant.
Extension to Bi-LSTMs and Tree-LSTMs. A Bi-LSTM performs a forward and a backward pass on the input. The forward pass is the same as the forward pass in the original LSTM. For the backward pass, we reverse the input string x, the input of the match function ϕ i and the input/output of the replace function f i of each transformation.
A Tree-LSTM takes trees as input. We can define the programmable perturbation space over trees in the same form of Eq 4, where T i is a tree transformation. We show some examples of tree transformations in Fig 3. T DelStop (Fig 3(a)) removes a leaf node with a stop word in the tree. After removing, the sibling of the removed node becomes the new parent node. T Dup (Fig 3(b)) duplicates a word in a leaf node by first removing the word and expanding the leaf node with two children, each of which contains the previous word. T SubSyn (Fig 3(c)) substitutes a word in the leaf node with one of its synonyms.
We provide the formalization of ARC on Bi-LSTMs and Tree-LSTMs in the appendix.

Evaluation
We implemented ARC in PyTorch. The source code is available online 1 and provided in the supplementary materials.    (Socher et al., 2013), and SST2, a two-way class split of SST. IMDB and SST2 have reviews in sentence form and binary labels. SST has reviews in the constituency parse tree form and five labels.
Perturbation Spaces. Following Zhang et al.
(2020), we create perturbation spaces by combining the transformations in Table 2, e.g., {(T DelStop , 2), (T SubSyn , 2)} removes up to two stop words and replaces up to two words with synonyms. We also design a domain-specific perturbation space S review for movie reviews; e.g., one transformation in S review can duplicate question or exclamation marks because they usually appear repeatedly in movie reviews. We provide the detailed definition and evaluation of S review in the appendix. For Tree-LSTMs, we consider the tree transformations exemplified in Fig 3 and   Original sample in SST2 dataset i was perplexed to watch it unfold with an astonishing lack of passion or uniqueness .
+ve Original sample in SST2 dataset this is pretty dicey material . -ve Perturbed sample in {(TDup, 2), (TSubSyn, 2)} this becomes pretty pretty dicey material . . +ve curacy (EX Acc.) is the percentage of points in the test set that is S-robust (Eq 1). HotFlip accuracy is an upper bound of exhaustive accuracy; certified accuracy is a lower bound of exhaustive accuracy.
Baselines. For training certifiable models against arbitrary string transformations, we compare ARC to (1) Normal training that minimizes the cross entropy.
(2) Data augmentation that augments the dataset with random samples from the perturbation space. For certification, we compare ARC to (1) POPQORN (Ko et al., 2019a), the state-of-the-art approach for certifying LSTMs. (2) SAFER (Ye et al., 2020) that provides probabilistic certificates to word substitution.
Xu et al. (2020) is a special case of ARC where the perturbation space only contains substitution. We provide theoretical comparison in the appendix.

Arbitrary Perturbation Spaces
Comparison to Data Augmentation & HotFlip. We use the three perturbation spaces in Table 4 and the domain-specific perturbation space S review in Table 5.
ARC outperforms data augmentation and Hot-Flip in terms of EX Acc. and CF Acc.
Table 4 shows the results of LSTM, Tree-LSTM, and Bi-LSTM models on the tree perturbation  spaces. Table 5 shows the results of LSTM models on the domain-specific perturbation space S review . ARC has significantly higher EX Acc. than normal training (+8.1, +14.0, +8.7 on average), data augmentation (+4.2, +10.4, +5.0), and HotFlip (+3.6, +6.7, +10.4) for LSTM, Tree-LSTM, and Bi-LSTM respectively. Models trained with ARC have a relatively high CF Acc. (53.6 on average). Data augmentation and HotFlip result in models not amenable to certification-in some cases, almost nothing in the test set can be certified.
ARC produces more robust models at the expense of accuracy. Other robust training approaches like CertSub and A3T also exhibit this trade-off. However, as we will show next, ARC retains higher accuracy than these approaches.
The LSTMs trained using ARC are more robust than the CNNs trained by A3T for both perturbation spaces; ARC can certify the robustness of models while A3T cannot. Table 6 shows that ARC results in models with higher accuracy (+2.3 and +0.3), HF Acc. (+5.9 and +1.7), and EX Acc. (+6.8 and +6.3) than those produced by A3T. ARC can certify the trained models while A3T cannot.

Experiments on Word Substitution
We compare to works limited to word substitution.
Comparison to CertSub and ASCC. We choose two perturbation spaces, {(T SubSyn , 1)} and {(T SubSyn , 2)}. We train one model per perturbation space using ARC under the same experimental setup of CertSub, BiLSTM on the IMDB dataset. By definition, CertSub and ASCC train for an arbitrary number of substitutions. CF Acc. is computed using ARC. Note that CertSub can only certify for {(T SubSyn , ∞)} and ASCC cannot certify.
ARC trains more robust models than CertSub for two perturbation spaces with word substitution. Table 7 shows that ARC achieves higher accuracy, CF Acc., and EX Acc. than CertSub on the two perturbation spaces.
ARC trains a more robust model than ASCC for {(T SubSyn , 1)}, but ASCC's model is more robust for {(T SubSyn , 2)}. Table 7 shows that the ARC-trained models have higher accuracy and CF Acc.
Comparison to POPQORN. We compare the certification of an ARC-trained model and a normal model against {(T SubSyn , 3)} on the first 100 examples in SST2 dataset. Because POPQORN can only certify the l p norm ball, we overapproximate the radius of the ball as the maximum l 1 distance between the original word and its synonyms.
ARC runs much faster than POPQORN. ARC is more accurate than POPQORN on the ARCtrained model, while POPQORN is more accurate on the normal model. ARC certification takes 0.17sec/example on average for both models, while POPQORN certification takes 12.7min/example. ARC achieves 67% and 5% CF Acc. on ARC-trained model and normal model, respec-  Comparison to SAFER (Ye et al., 2020). SAFER is a post-processing technique for certifying robustness via randomized smoothing. We train a Bi-LSTM model using ARC following SAFER's experimental setup on the IMDB dataset and SAFER's synonym set, which is different from CertSub's. We consider the perturbation spaces {(T SubSyn , 1)} and {(T SubSyn , 2)}. We use both ARC and SAFER to certify the robustness. The significance level of SAFER is set to 1%. SAFER has a higher certified accuracy than ARC. However, its certificates are statistical, tied to word substitution only, and are slower to compute. Considering {(T SubSyn , 2)}, ARC results in a certified accuracy of 79.6 and SAFER results in a certified accuracy of 86.7 (see appendix). Note that certified accuracies are incomparable because SAFER computes certificates that only provide statistical guarantees. Also, note that ARC uses O(n n i=1 δ i ) forward passes for each sample, while SAFER needs to randomly sample thousands of times. In the future, it would be interesting to explore extensions of SAFER to ARC's rich perturbation spaces.
ARC maintains a reasonable CF Acc. for in- creasingly larger spaces. Fig 4 shows the results, along with the maximum perturbation space size in the test set. ARC can certify 41.7% of the test set even when the perturbation space size grows to about 10 10 . For δ = 1, 2, 3, the CF Acc. is lower than the EX Acc. (8.2 on average), while the HF Acc. is higher than the EX Acc. (5.6 on average). Note that ARC uses a small amount of time to certify the entire test set, 3.6min on average, using a single V100 GPU, making it incredibly efficient compared to brute-force enumeration.

Conclusion
We present ARC, which uses memoization and abstract interpretation to certify robustness to programmable perturbations for LSTMs. ARC can be used to train models that are more robust than those trained using existing techniques and handle more complex perturbation spaces. Last, the models trained with ARC have high certification accuracy, which can be certified using ARC itself.

A.1 Proof of Lemmas and Theorems
Lemma 4.1 Proof. We prove Lemma 4.1 by induction on i, j and S. Base case: H ∅ 0,0 = h 0 is defined by both Eq 6 and Eq 8.
Inductive step for H S i,j : Suppose the lemma holds for H S i,j in Eq 6 comes from two cases illustrated in Fig 2. These two cases are captured by the first line and the second line in Eq 8, respectively. The inductive hypothesis shows that the lemma holds on states H S i−1,j−1 and H S k↓ i−t k ,j−s k . Thus, the lemma also holds on H S i,j . Theorem 4.1 Proof. We can expand Eq 5 using the decomposition of the perturbation space as Eq 11 and Eq 10 are equivalent, leading to the equivalence of Eq 5 and Eq 10.

Theorem 4.2 Theorem 4.2
Proof. We first show that Eq 7 is equivalent to As we will prove later, MAXLEN x is the upper bound of the length of the perturbed strings. Because t k , s k , δ k are typically small constants, we can regard MAXLEN x as a term that is linear in the length of the original string LEN x , i.e., MAXLEN x = O(LEN x ). Now, we prove that MAXLEN x is the upper bound of the length of the perturbed string. The upper bound MAXLEN x can be achieved by applying all string transformations that increase the perturbed string's length and not applying any string transformations that decrease the perturbed string's length. Suppose a string transformation T k = (f k , ϕ k ), f k : Σ s k → 2 Σ t k can be applied up to δ k times, then we can apply it δ k times to increase the perturbed string's length by (t k − s k )δ k .
The proof of soundness follows immediately from the fact that α, , and LSTM overapproximate their inputs, resulting in an overapproximation F .
The proof of complexity follows the property that the number of non-empty hyperrectangles This property follows the definition of the string transformations and the tight perturbation space S = . H S i,j can be non-empty iff For each H S i,j , we need to enumerate through all transformations, so the complexity is O(LEN x · n n i=1 δ i ) in terms of the number of LSTM cell evaluations. The interval bound propagation needs only two forward passes for computing the lower bound and upper bound of the hyperrectangles, so it only contributes constant time to complexity. In all, the total number of LSTM cell evaluations needed is O(LEN x · n n i=1 δ i ). A.1.1 Comparison to Xu et al. (2020) The dynamic programming approach proposed in Xu et al. (2020) is a special case of ARC where the perturbation space only contains substitutions. The abstract state g i,j in their paper (Page 6, Theorem 2) is equivalent to H {(T SubSyn ,j)} i,i in our paper.

A.2 Handling Attention
We have introduced the memoization and abstraction of final states in Section 4. Moreover, for LSTM architectures that compute attention of each state h i , we would like to compute the interval abstraction of each state at the ith time step, denoted as H i .
It is tempting to compute H i as Unfortunately, Eq 12 does not contain states that are in the middle of a string transformation (see next example).
But Eq 12 is correct for i = 2 because the transformation T swap completes at time step 2.
Think of the set of all strings in a perturbation space as a tree, like in Fig. 1(b), where strings that share prefixes share LSTM states. We want to characterize a subset G S i,j of LSTM states at the ith layer where the perturbed prefixes have had all transformations in a space S applied on the original prefix x 1:j and are in the middle of transformation T k . Intuitively, G S i,j is a super set of H S i,j defined in Section 4.
We formally define G S i,j as follows: (13) We rewrite Eq 13 by explicitly applying the transformations defining the perturbation space S, thus deriving our final equations: where a = j−s k +1 and b = j. Notation f k,:l (x a:b ) collects the first l symbols for each z in f k (x a:b ), i.e., (1) strings whose suffix is perturbed by T k = (ϕ k , f k ) and the last symbol of z is the lth symbol of the output of T k (the first line of Eq 14), and (2) strings whose suffix (the last character) is not perturbed by any transformations (the second line of Eq 14).
Then, H i can be defined as Lemma A.1. Eq 13 and Eq 14 are equivalent.
The above lemma can be proved similarly to Lemma 4.1.
We use interval abstraction to abstract Eq 14 similarity as we did in Section 4.2. The total number of LSTM cell evaluation needed is

A.3 Handling Bi-LSTMs
Formally, We denote x R as the reversed string x. Suppose a transformation T has a match function ϕ and a replace function f , the reversed transfor-

A.4 Handling Tree-LSTMs
Intuitively, we replace substrings in the formalization of LSTM with subtrees in the Tree-LSTM case. We denote the subtree rooted at node u as t u and the size of t u as SIZE tu . The state H S u denotes the Tree-LSTM state that reads subtree t u generated by a tight perturbation space S. The initial states are the states at leaf node u and the final state is H S root . We provide transition equations for three specific tree transformations Fig 3.

A.4.1 Merge states
For a non-leaf node v, we will merge two states, each from a child of v.
where v 1 and v 2 are children of v, and TRLSTM denotes the Tree-LSTM cell that takes two states as inputs. Notation S − S computes a tight perturbation space by subtracting S from S. Formally, suppose Notice that Eq 15 is general to any tight perturbation space S containing these three tree transformations.

A.4.2 T SubSyn
We first show the computation of H S u for a leaf node u. The substitution only happens in the leaf nodes because only the leaf nodes correspond to words. Trans Description Treview1 substitute a phrase in the set A with another phrase in A.
Treview2 substitute a phrase in the set B with another phrase in B or substitute a phrase in C with another phrase in C. Treview3 delete a phrase "one of" from "one of the most ..." or from "one of the ...est". Treview4 duplicate a question mark "?" or an exclamation mark "!".

A.4.3 T Dup
T Dup can be seen as a subtree substitution at leaf node u.

A.4.4 T DelStop
Things get tricky for T DelStop because {(T DelStop , δ)} can delete a whole subtree t v if (1) the subtree only contains stop words and (2) SIZE tv ≤ δ. We call such subtree t u deletable if both (1) and (2) are true. Besides the merging equation Eq 15, we provide another transition equation for H S v , where v is any non-leaf node with two children v 1 , v 2 and a perturbation space (1) and (2) ∅ otherwise (16) where S (1)

A.4.5 Soundness and Complexity
We use interval abstraction to abstract the transition equations for Tree-LSTM similarity as we did in Section 4.2. The total number of LSTM/Tree-LSTM cell evaluations needed is . The term ( n i=1 δ i ) 2 comes from Eq 15, as we need to enumerate S for each S in the decomposition set.

A.5 Experimental Setup
We conduct all experiments on a server running Ubuntu 18.04.5 LTS with V100 32GB GPUs and Intel Xeon Gold 5115 CPUs running at 2.40GHz.

A.5.1 Definition for S review
We design S review by inspecting highly frequent n-grams in the movie review training set. Formally, where T review1 , T review2 , T review3 , and T review4 are defined in Table 8 with A = {"this is", "this 's", "it is", "it 's"} B = {"the movie", "the film", "this movie", "this film", "a movie", "a film"} C = {"the movies", "the films", "these movies", "these films"}

A.5.2 Implementation of ARC
We provide a general implementation of ARC on LSTM against arbitrary user-defined string transformations. We also provide specific implementations of ARC on LSTM, Tree-LSTM, and Bi-LSTM against three transformations in Table 2 and Fig 3. Specific transformations allow us to optimize the specific implementations to utilize the full power of parallelism on GPU so that the specific implementations are faster than the general implementation. We conduct all our experiments on the specific implementations except for S review .

A.5.3 Details of Training
A3T: A3T has two instantiations, A3T (HotFlip) and A3T (Enum). The difference between the two instantiations is the way it explores the augmentation space in A3T. We choose to show A3T (HotFlip) for comparison, but ARC wins over A3T (Enum) as well.
ASCC: ASCC updates the word embedding during training by defaults. In our experiments, we fix the word embedding for ASCC.
ARC: All the models trained by ARC have hidden state and cell state dimensions set to 100. We adopt a curriculum-based training method Huang et al., 2019;Jia et al., 2019;    for training ARC by using a hyperparameter λ to weigh between the normal loss and the abstract loss and using a hyperparameter to gradually increasing the radius of synonym sets. We gradually change two hyperparameters from 0 to their maximum values by T 1 epochs and keep training with their maximum values by T 2 epochs. Maximum values of hyperparameters λ and . For the experiments in Table 4, we tune the maximum of λ during training from 0.5 to 1.0 (with span 0.1) for LSTM and Bi-LSTM models and from 0.05 to 0.10 (with span 0.01) for Tree-LSTM models. For other experiments, which only use word substitutions, we fix the maximum of λ to be 0.8 following Jia et al. (2019).
For every experiment, the maximum of during training is defined by the size of word substitutions in the perturbation space. For example, {(T DelStop , 2), (T SubSyn , 2)} defines the maximum of as 2 and {(T DelStop , 2), (T Dup , 2)} defines the maximum of as 0.
We use early stopping for other training methods and step the early stopping epoch as 5.
We provide the training scripts and all trained models in supplementary materials.

A.6 Evaluation Results
The full results of comparison to SAFER are shown in Table 9.
Comparison to Huang et al. (2019). We use {(T SubSyn , 3)} on SST2 dataset for comparison between ARC and Huang et al. (2019). We directly quote the results in their paper.
ARC trains more robust LSTMs than CNNs trained by Huang et al. (2019). Table 10 shows that ARC results in models with higher accuracy (+1.6), HF Acc. (+1.1), CF Acc. (+28.8), and EX Acc. (+3.4) than those produced by Huang et al. (2019).  Effectiveness of ARC-A3T. We can apply the idea of A3T to ARC, extending ARC to abstract any subset of the given perturbation space and to augment the remaining perturbation space. We show the effectiveness of this extension in the appendix. We evaluate ARC-A3T on the same perturbation spaces as we do for A3T. For each perturbation space, ARC-A3T has four instantiations: abstracting the whole perturbation space (downgraded to ARC), abstracting the first perturbation space ({(T DelStop , 2)} or {(T Dup , 2)}), abstracting the second perturbation space ({(T SubSyn , 2)}), and augmenting the whole perturbation space. We use enumeration for augmenting. We do not test the last instantiation because enumeration the whole perturbation space is infeasible for training. We further evaluate the trained models on different perturbation sizes, i.e., {(T DelStop , δ), (T SubSyn , δ)} and {(T Dup , δ), (T SubSyn , δ)} with δ = 1, 2, 3.