BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks

Adversarial attacks expose important blind spots of deep learning systems. While word- and sentence-level attack scenarios mostly deal with finding semantic paraphrases of the input that fool NLP models, character-level attacks typically insert typos into the input stream. It is commonly thought that these are easier to defend via spelling correction modules. In this work, we show that both a standard spellchecker and the approach of Pruthi et al. (2019), which trains to defend against insertions, deletions and swaps, perform poorly on the character-level benchmark recently proposed in Eger and Benz (2020) which includes more challenging attacks such as visual and phonetic perturbations and missing word segmentations. In contrast, we show that an untrained iterative approach which combines context-independent character-level information with context-dependent information from BERT's masked language modeling can perform on par with human crowd-workers from Amazon Mechanical Turk (AMT) supervised via 3-shot learning.


Introduction
Adversarial attacks to machine learning systems are malicious modifications of their inputs designed to fool machines into misclassification but not humans (Goodfellow et al., 2015). One of their goals is to expose blind-spots of deep learning models, which can then be shielded against. In the NLP community, typically two different kinds of attack scenarios are considered. "High-level" attacks paraphrase (semantically or syntactically) the input sentence (Iyyer et al., 2018;Alzantot et al., 2018;Jin et al., 2020) so that the classification label does not change, but the model changes its decision. Of-Aɠuyĩˢpɬayįng aṱrǜmₚèt.
Substring Levenshtein distance Figure 1: A high-level overview of the processing of an example sentence in our adversarial-defense pipeline. The sentences shown for the hypothesis have been created by choosing the maximum of their associated probability distributions over words.
ten, this is framed as a search problem where the attacker has at least access to model predictions (Zang et al., 2020). "Low-level" attackers operate on the level of characters and may consist of adversarial typos (Belinkov and Bisk, 2018;Ebrahimi et al., 2018a;Pruthi et al., 2019;Jones et al., 2020) or replacement of characters with similarly looking ones Li et al., 2020a). Such attacks may also be successful when the attacker operates in a blind mode, without having access to model predictions, and they are arguably more realistic, e.g., in social media. However, Pruthi et al. (2019) showed that orthographic attacks can be addressed by placing a spelling correction module in front of a downstream classifier, which may be considered a natural solution to the problem. 1 In this work, we apply their approach to the recently proposed benchmark Zéroe of Eger and Benz (2020), illustrated in Table 1, which provides an array of cognitively motivated orthographic attacks, including missing word segmentation, phonetic and visual attacks. We show that the spelling correction module of Pruthi et al. (2019), which has been trained on simple typo attacks such as character swaps and character deletions, fails to generalize to this benchmark. This motivates us to propose a novel technique to addressing various forms of orthographic adversaries that does not require to train on the low-level attacks: first, we obtain probability distributions over likely true underlying words from a dictionary using a context-independent extension of the Levenshtein distance; then we use the masked language modeling objective of BERT, which gives likelihoods over word substitutions in context, to refine the obtained probabilities. We iteratively repeat this process to improve the word context from which to predict clean words. Finally, we apply a source text independent language model to produce fluent output text.
Our contributions: (i) We empirically show that this approach performs much better than the trained model of Pruthi et al. (2019) on the Zéroe benchmark. Furthermore, (ii) we also evaluate human robustness on Zéroe and (iii) demonstrate that our iterative approach, which we call BERT-Defense, sometimes even outperforms human crowd-workers trained via 3-shot learning.
2 Related work Zeng et al. (2020) classify adversarial attack scenarios in terms of the accessibility of the victim model to the attacker: 2 white-box attackers (Ebrahimi et al., 2018b) have full access to the victim model including its gradient to construct adversarial examples. In contrast, black-box attackers have only limited knowledge of the victim models: score- (Alzantot et al., 2018;Jin et al., 2020) and decisionbased attackers (Ribeiro et al., 2018) require access 1 One could argue that such a pipeline solution is not entirely satisfactory from a more theoretical perspective, and that downstream classifiers should be innately robust to attacks in the same way as humans.
2 Another recent survey of adversarial attacks in NLP is provided by Roth et al. (2021). to the victim models' prediction scores (classification probabilities) and final decisions (predicted class), respectively. A score-based black-box attacker of particular interest in our context is BERT-ATTACK (Li et al., 2020b). BERT-ATTACK uses the masked language model (MLM) of BERT to replace words with other words that fit the context. BERT-ATTACK is related to our approach because it uses BERT's MLM in an attack-mode while we use it in defense-mode. Further, in our terminology, BERT-ATTACK is a high-level attacker, while we combine BERT with an edit distance based approach to restore low-level adversarial attacks. Blind attackers make fewest assumptions and have no knowledge of the victim models at all. Arguably, they are most realistic, e.g., in the context of online discussion forums and other forms of social media where users may not know which model is employed (e.g.) to censor toxic comments and users may also not have (large-scale) direct access to model predictions.
In terms of blind attackers,  design the visual perturber VIPER which replaces characters in the input stream with visual nearest neighbors, an operation to which humans are seemingly very robust. 3 Eger and Benz (2020) propose a canon of 10 cognitively inspired orthographic character-level blind attackers. We use this benchmark, which is illustrated in Table 1, in our application scenario. While  and Eger and Benz (2020) are only moderately successful in defending against their orthographic attacks with adversarial learning (Goodfellow et al., 2015) (i.e., including perturbed instances at train time), Pruthi et al. (2019) show that placing a word recognition (correction) module in front of a downstream classifier may be much more effective. They use a correction model trained to recognize words corrupted by random adds, drops, swaps, and keyboard mistakes. Zhou et al. (2019) also train on the adversarial attacks (insertion, deletion, swap as well as word-level) against which they defend. In contrast, we show that an untrained attack-agnostic iterative model based on BERT may perform competitively even with humans (crowd-workers) and that this correction module may further be improved by leveraging attack-specific knowledge. Jones et al.
(2020) place an encoding module-which should map orthographically similar words to the same (discrete) 'encoding'-before the downstream classifier to improve robustness against adversarial typos. However, in contrast to Pruthi et al. (2019) and BERT-Defense, their model does not restore the attacked sentence to its original form so that it is less desirable in situations where knowing the underlying surface form may be relevant (e.g., for human introspection or in tasks such as spelling normalization).
In contemporaneous work, Hu et al. (2021) use BERT for masked language modeling together with an edit distance to correct a misspelled word in a sentence. They assume a single misspelled word that they correct by selecting from a set of edit distance based hypotheses using BERT. In contrast, in our approach we assume that multiple or even all words in the sentence have been attacked using adversarial attacks and that we do not know which ones. Then, we use an edit distance and integrate its results probabilistically with context information obtained by BERT, rather than using edit distance only for candidate selection.

Methods
Our complete model, which is outlined in Figure 1 on a high level, has three intuitive components. The first component is context-independent and tries to detect the tokens in a sentence from their given (potentially perturbed) surface forms. This makes sense, since we assume orthographic low-level attacks on our data. The second component uses context, via masked language modeling in BERT, to refine the probability distributions obtained from the first step. The third component uses a language model (in our case, GPT) to make a choice between multiple hypotheses. In the following, we describe each of the three components.

Context-independent probability
In the first step of our sentence restoration pipeline, we use a modified Levenshtein distance to convert the sentence into a list of probability distributions over word-piece tokens from a dictionary D. For the dictionary, we choose BERT's (Devlin et al., 2019) default word-piece dictionary.
We begin by splitting the attacked sentence S at spaces into word tokensw i . However, to be able to use our word-piece dictionary D, we need to find the appropriate segmentation of the tokens into word-pieces.
Modified Levenshtein distance. We developed a modified version of the Wagner-Fischer algorithm (Wagner and Fischer, 1974) that calculates a Levenshtein distance to substrings of the input string and keeps track of start as well as end indices of matching substrings. For eachw i in S, this algorithm (which is described in Appendix A.1) calculates the substring Levenshtein distance dist to every word-piece w d in D.
Segmentation hypothesis. We store the computed distances dist(w i , w d ) in a dictionary C i that maps each start-index s and end-index e to a list of distances, i.e., C i associates Here, D selects the subset of all word-pieces in D that matchw i at the substring between s and e. Using C i , we can then perform a depth-first search to composew i from start and end-indices in C i . For example, a 10 character wordw i could be segmented into two words-pieces that match the substrings from positions 1-5 and 6-10, respectively, or a single word that matches from 1-10. Let c i be the set of all segmentations ofw i from start and end indices. For example, c i could be (1, 5), (6, 10) , (1, 10) . For each segmentation c i,α ∈ c i , we then calculate a total distance d(c i,α ) as a sum of the minimum distances of all parts: Using the total distances to segment each tokeñ w i , we can now create hypotheses H about how the whole sentence S consisting of n tokens should be segmented into word-pieces. For this, we calculate the Cartesian product between the sets of possible segmentations for each wordw i , i = 1, . . . , n: We set the loss of one hypothesis h = (c 1,α 1 , . . . , c n,αn ) ∈ H as the sum of the total distances of its parts that we calculated in Eq. (1) By evaluating the softmax on the negative total distances of the hypothesis, we calculate probabilities if a hypothesis h v ∈ H is equal to the true (unknown) segmentation h * of the n tokens: We will refer back to these probabilities in §3.3.
Word probability distributions. In a hypothesis h ∈ H, a tokenw i has a single segmentation of start and end indices associated with it, c i,α . For all start-and end-indices (s, e), C i [s, e] stores the distances of the words that matchw i between s and e. Let D again be the dictionary containing all those words. Let w d be a word-piece in D and let w * ∈ D be the true match for the substring between s and e ofw i . Then, we can compute a context-independent probability that w d is equal to w * , by evaluating the softmax on the negative distances stored in C i : When we do this for all words in h and concatenate the results, we get a vector V h of probability distributions over dictionary word-pieces. This is illustrated in Figure 2. We introduce the following notation to select a probability distribution based on its index in h using the subscript j: Domain-specific distance. In the remainder, we will refer to the way of calculating the substring distance as described above as attack-agnostic. Beyond this, we also aim to leverage domain-specific knowledge. We refer to such an augmented distance as the domain-specific distance dist M . Here, we modify the operation costs in the substring Levenshtein distance in certain situations.  Figure 2: A context independent probability distribution over words calculated for an example input sentence. There are multiple segmentation-hypothesis associated with the sentence that each consist of a sequence of probability distributions over word-tokens.

Edit distance is reduced for visually similar
characters. This builds on visual character representations . See appendix A.3 for details.
2. Addressing intruder attacks, we reduce deletion costs depending on the frequency f of the character in the source word. Our assumption is that the same intruder symbol may be repeated in one word. Thus, we decay the cost exponentially for increasing frequency using the formula 0.75 f −1 .
3. Vowel insertion cost is reduced to 0.3 for words that contain no vowels.
To address letter-shuffling, we additionally compute an anagram distance of how close the attacked wordw i is to being an anagram to the dictionary word w d . Let m be the number of characters that are in one of the two words, but not in the other. Then, our anagram distance dist A computes to When two words are permutations of each other, the anagram distance is minimal and otherwise it increases linearly in the number of different characters between the two words. We then take the minimum of the anagram distance and the substring Levenshtein distance with modified operation costs dist M to obtain the domain-specific dist F :

Context-dependent probability using BERT
In the following, we describe the context-based improvement for a single hypothesis h ∈ H. In  Figure 3: Iterative, context-based improvements of the word predictions using BERT for masked LM. Each iteration, a different token will be masked. We calculate context-dependent probabilities using Eq. (3) and integrate them with our context-independent probabilities in Eq. (4). Figure 3, the whole process is illustrated for an example sentence. The number of required iterations should scale linearly with the amount of tokens in the hypothesis, so we perform 2 · |h| iterations in total. To perform one improvement iteration, we perform the following steps: 1) Select an index j, of a token, that will be masked for this iteration. 2) For the next part, we slightly modify BERT for masked LM. Instead of using single tokens as inputs as in BERT, we want to use our contextindependent probability distributions over wordpiece tokens. Thus, for each token w h,j in h, we embed all relevant tokens w d from the contextindependent process described above using BERT's embedding layer and combine them into a weighted average embedding using weights P h,j (w d = w * | w).
3) We now bypass BERT's embedding layer and feed the weighted average embeddings and the embedding for the mask token directly into the next layers of BERT 1 . As a result, BERT provides us with a vector of scores S BERT for how well the words from the word-piece dictionary D fit into the position of the masked word. 4) By applying the softmax on these scores, we obtain a new probability distribution over word-1 Although BERT has only been trained on the single token embeddings, we empirically found that feeding in averaged embeddings produces very sensible results. pieces which is dependent on the context c of the token at position j: (3)   5) We make the simplifying assumption that each word is attacked independently from the other words. Thus, the context c is independent of the attack on the wordw. This means that the following equality holds: 6) We go back to step 1) and use P h, to create the average embedding at position j.

Selecting the best hypothesis with GPT
After performing the context-based improvements, we are left with multiple hypothesis h ∈ H. Each of them has a hypothesis probability P(h=h * | S) and a list of word-piece probabilities of length |h| over dictionary words associated with it. Now, we finally collapse the probability distributions by taking the argmax to form actual sentences S h : This allows us to use GPT (Radford et al., 2018) to calculate a language modeling (perplexity) score LM S h for each sentence. Using softmax, we again transform these scores into a probability distribution that describes the probability of a segmentation hypothesis h v ∈ H being the correct segmentation h * , based on the restored sentences S H : The original probability P(h v =h * | S) assigned to each hypothesis is only based on the results of the Levenshtein distance for the attacked sentence S. Thus, as P(h v =h * | S) only depends on the character-level properties of the attacked sentence and P(h v =h * | S H ) only depends on the semantic properties of the underlying sentences, it makes sense to assume that these distributions are independent. This allows us to simply multiply them, to get a probability distribution that captures semantic as well as character-level properties: Hypothesis 1 (p=72%): He is riding a starboard.
Hypothesis 2 (p=28%): He is riding a skateboard.  Figure 4: An example of how we use OpenAI GPT to decide on which hypothesis to choose as our final sentence prediction. The original probability of the segmentation hypothesis calculated in Eq. (2) is multiplied with a probability calculated from the language modeling score using Eq. (6).
In Figure 4, we visualize the above described process for a specific example with only 2 hypotheses.

Experimental Setup
To obtain adversarially attacked sentences against which to defend, we use the Eger and Benz (2020) benchmark Zéroe of low-level adversarial attacks. This benchmark contains implementations for a wide range of cognitively inspired adversarial attacks such as letter shuffling, disemvoweling, phonetic and visual attacks. The attacks are parameterized by a perturbation probability p ∈ [0, 1] that controls how strongly the sentence is attacked.
We decided to slightly modify two of the attacks in Zéroe, the phonetic and the visual attacks. On close inspection, we found the phonetic attacks to be too weak overall, with too few perturbations per word. The visual attacks in Zeroé are based on pixel similarity which is similar to the visual similarity based defense in our domain-specific model. Thus, to avoid attacking with the same method we defend with, we decided to switch to a description based visual attack model (DCES), just like in the original paper . 4 Our modifications are described in Appendix A.2.
Evaluation Instead of evaluating on a downstream task, we evaluate on the task of restoring the original sentences from the perturbed sentences. This allows us to easier compare to human performances. It also provides a more difficult test case, 4 Using description based defense and pixel based attacks would have been possible just as well, but we believe doing it reversely is consistent with the original specification in . as a downstream classifier may infer the correct solution even with part of the input destroyed or omitted. Finally, being able to correct the input is also important when the developed tools would be used for humans, e.g., in spelling correction.
We evaluate the similarity of the sentences to the ground-truth sentences with the following metrics: 1. Percent perfectly restored (PPR). The percent of sentences that have been restored perfectly. This is a coarse-grained sentence-level measure.
2. Editdistance. The Levenshtein (edit) distance measures the number of insertions, deletions, and substitutions necessary (on character-level) to transform one sequence into another.
3. MoverScore (Zhao et al., 2019). MoverScore measures the semantic similarity between two sentences using BERT. It has been show to correlate highly with humans as a semantic evaluation metric.
For all of the metrics, letter case was ignored.
Attack scenarios. We sampled 400 sentences from the GLUE (Wang et al., 2018) STS-B development dataset for our experiments. We use various attack scenarios to attack the sentences: i) Each of the attack types of the Zéroe benchmark (see Table 1). We set p = 0.3 throughout.
ii) To evaluate how higher perturbation levels influence restoration difficulty, we create 5 attack scenarios for one attack scenario (we randomly chose phonetic attacks) with perturbation levels p from 0.1 to 0.9.
iii) We add combinations of attacks: these are performed by first attacking the sentence with one attack and then with another.  Figure 5: Comparison between BERT-Defense, and the two baseline adversarial defense tools "pyspellchecker" and "ScRNN defense". The x-labels describe the attack and perturbation level the sentences were attacked with, before applying on of the adversarial defense methods. For conditions with two attack types, the perturbations were applied in order. For edit distance, lower is better. For the other metrics, higher is better. Exact values for the results are included in the appendix in Table 6. lines: (a) the Pyspellchecker (Barrus, 2020), a simple spellchecking algorithm that uses the Levenshtein distance and word frequency to correct errors in text; (b) "ScRNN defense" from the Pruthi et al. (2019) paper. This method uses an RNN that has been trained to recognize and fix character additions, swaps, deletions and keyboard typos. Further, as we use Zeroé, a cognitively inspired attack benchmark supposed to fool machines but not humans, it is especially interesting to see how BERT-Defense compares to human performance. Thus, (c) we include human performance, obtained from a crowd-sourcing experiment on Amazon Mechanical Turk (AMT). Note that humans are often considered upper bounds in such settings.
Human experiment. Twenty-seven subjects were recruited using AMT (21 male, mean age 38.37, std age 10.87) using PsiTurk (Gureckis et al., 2016). Participants were paid $3 plus up to $1 score based bonus (mean bonus 0.56, std bonus 0.40) for restoring about 60 adversarially attacked sentences. The task took on average 43.9 minutes with a standard deviation of 20.1. Twenty of the subjects where native English speakers, seven where non-native speakers. The two groups did not significantly differ regarding their edit distances to the true underlying sentences (unequal variance t-test, p = .85).
We sampled 40 random sentences from nine of our attack scenarios plus 40 random (non-attacked) sentences from the original document. Each sentence was restored by four different humans. The whole set of 1600 sentences (10 scenarios times 40 sentences each times 4 repetitions) was then randomly split into 27 sets of about 60 sentences. No split contained the same sentence multiple times. Each of the 27 participants got one of these sets assigned. After a short instruction text, the participants where shown three examples of how to correctly restore a sentence ("3-shot learning"). Then they were shown the sentences in their set sequentially and entered their attempts at restoring the sentences into a text-field.

Results and Discussion
Comparison with baselines. Figure 5 visualizes the results (full results are in the appendix). BD agn (BERT-Defense, attack-agnostic) significantly outperforms both baselines regarding Mover-Score and PPR for all random attack scenarios (p 0.01, equal variance t-test). However, only BD spec (BERT-Defense, domain-specific) achieves a lower edit distance than the baselines. This discrepancy between the measures is explained by the fact that, by taking context into account, BERT-Defense searches for the best restoration in the space of sensible sentences, while Pyspellchecker searches the best restoration for each word individually. Although ScRNN defense uses an RNN and is able to take context into account, we found that it also mainly seems to restore the words individually and rarely produces grammatically correct sentences for strongly attacked inputs. Table 3, which illustrates failure cases of all models, sup- ports this. In the failure case when BERT-Defense fails to recognize the correct underlying sentence, BERT-Defense outputs a mostly different sentence that usually does make some sense, but has little in common with the ground-truth sentence. This results in much higher edit distances than the failure cases of the baselines which produce grammatically wrong sentences, while restoring individual words the best they can (this sometimes means not trying at all). Interestingly, humans tend to produce similar failure cases as BERT-Defense. When comparing the performance on specific attacks, we see a consistent margin of about 0.2 MoverScore and 15-35 percentage points PPR between BD agn and the baselines across all attacks. Exceptions include inner-shuffle, for which ScRNN-Defense is on par with BD agn and segmentation attacks, which hurt the performance of the baselines far more than the performance of BERT-Defense, which includes segmentation hypothesis as an essential part of its restoration pipeline. For BD spec , we see gains for attacks where we leverage domain-specific knowledge. The biggest gains of around 0.25 MoverScore are achieved against fullshuffle, inner-shuffle and disemvoweling attacks.
In the No attack condition, we checked if the adversarial defense methods introduce mistakes when presented with clean sentences. Indeed, all models introduce some errors: all three evaluation metrics show that BERT-Defense introduces a few more errors than Pyspellchecker but less than ScRNN defense.
Comparison with humans. As stated before, we evaluate human performance on 40 random sentences for each of nine attacks and the no attack condition (see appendix). For each of the sentences, we obtain restorations from 4 crowd-workers. For each attack scenario, we evaluate our metrics on all restorations of these 40 sentences and averaged the results. The results on the 40 attacked sentences are shown in Figure 6. While BD agn performs slightly worse than humans, BD spec matches human performance with respect to all three evaluation metrics. Regarding performance on specific attacks, humans are still better than BERT-Defense when it comes to defending phonetic attacks, while they have a hard time defending full-shuffle attacks. The evaluations for the No attack setting reveal that the crowd-workers in our experiment do make quite a few copying mistakes. In fact, they introduce slightly more mistakes than BERT-Defense. Ablation Study. We perform an ablation study to asses the contribution of each individual component of BERT-Defense. For the No Levenshtein distance condition, we created the contextindependent probability distribution by setting the probability of known words (words in the dictionary) in the attacked dataset to one and using a uniform random distribution for all unknown words.

ScRNN BDagn
To lorge doog's wronsing in sum grass. to lorge doog's wronsing in sum grass. two large dogs rolling in the grass. Two large dogs runningin some grass. two large dogs runningin some grass. two large dogs running in some grass. Tw large dogs rnnng in some grss. throw large dogs running in some grss. two large dogs running in some grass. Two larg dog runnin in some grass.
two larg dog runnin in some grass. a large dog running in some grass. Twolarge dogs running income graas.
twolarge dogs running income graas. two large dogs running into grass. To lrg doog's rntng in sm gras .. to long dogs ring in sm gras. to the dogs running in the grass.  When using BERT-Defense without BERT, we directly select the best hypothesis from the contextindependent probability distribution using GPT. To run BERT-Defense without GPT, we select the hypothesis with the highest probability according to the results from the modified Levenshtein distance and improve it using context-dependent probabilities obtained with BERT. We evaluate on the rd:0.3,rd:0.3 attack scenario, because we think that it is the most challenging attack.
The results are shown in Figure 7. They indicate that the most important component of BERT-Defense is the Levenshtein distance, as BERT often does not have enough context to meaningfully restore the sentences, given the difficult attacks from Zeroé that typically modify many words in each sentence. Removing BERT also considerably decreases the performance of the defense model. Finally, BERT-Defense without GPT performs on par with BD agn in these experiments, suggesting that BERT-Defense can also be used without GPT for hypothesis selection.
More illustrating examples. To give an impression of the dataset and how the models cope with the adversarial attacks, we show more illustrating examples in Tables 2 and 5 (appendix). These indicate the superiority of our approach in that it typically generates semantically adequate sentences.

Conclusion
We introduced BERT-Defense, a model that probabilistically combines context-independent word level information obtained from edit distance with context-dependent information from BERT's masked language modeling to combat low-level orthographic attacks. Our model does not train on possible error types but still substantially outperforms a spell-checker as well as the model of Pruthi et al. (2019), which has been trained to shield against edit distance like attacks, on a comprehensive benchmark of cognitively inspired attack scenarios. We further show that our model rivals human crowd-workers supervised in a 3-shot manner. The generality of our approach allows it to be applied to a variety of different "normalization" problems, such as spelling normalization or OCR post-correction (Eger et al., 2016) besides the adversarial attack scenario considered in this work, which we will explore in future work.
We release our code and data at https:// github.com/yannikkellerde/BERT-Defense. Phonetic attacks. The phonetic embeddings implemented in Eger and Benz (2020) do not consistently produce phonetic attacks of sufficient quality. Thus, we used a many-to-many aligner (Jiampojamarn et al., 2007;Eger, 2015) together with the CMU Pronouncing Dictionary (cmudict) (University, 2014) and a word frequency list to calculate statistics for the correspondence between letters and phonemes. To attack a word, we convert the word to phonemes using cmudict and then convert it back to letters by sampling from the statistics. The perturbation probability p for this attack controls the sampling temperature which describes how likely it is to sample letters that less frequently correspond to the phoneme in question. Using this method, we generate high-quality phonetically attacked sentences such as the one in Table 1.

A.3 Visual similarity
We calculate the visual similarity of 30000 Unicode characters to 26 letters and 10 numbers. Each glyph is drawn with Python's pillow library (Lundh and Clark, 2020) in 20pt using a fitting font from the google-Noto font collection. The bitmap is then cropped to contain only the glyph. Then the image is resized and padded on the right and bottom to be of size 30px × 30px. When comparing the bitmap of a unicode glyph image and a letter/number glyph, multiple versions of the letter/number bitmap are created. For letters, the lowercase as well as the uppercase versions of each letter are taken. The bitmap gets downsized to 5 different sizes between 30px×30px and 15px×15px, rotated and flipped in all 8 unique ways and then padded to 30px × 30px again, such that the glyph is either placed at the top-left or the bottom left. See Figure 8 for an example. The percentage of matching black pixels between bitmaps are calculated and the highest matching percentage of all version becomes the similarity score S. The substitution cost between two characters will then be calculated based on the similarity with the equation cost = max (0, min (1, (0.8 − S) * 3)). The parameters of this equation have been tuned, so that highly similar characters have a in very low substitution costs while weakly similar characters have next to no reduced in substitution cost.

A.4 Parameters, runtime and computing infrastructure
All experiments where run on a single machine using an Intel(R) Core(TM) i7-4790K processor and a Nvidia GeForce GTX 1070 Ti graphics card. The restoration of a single sentence in the experiments took on average 0.1 seconds using ScRNN Defense, 1.34 seconds for Pyspellchecker and 8 seconds for BERT-defense. In total, BD agn includes 5 free parameters, most of them controlling the temperature of the used softmax operation to ensure good relative weighting of the probability distributions. The parameter values are shown in Table 4. All additional parameters for BD spec have been described in §3.1.

Parameter
Value Softmax temperature for context-independent hypothesis 10 Softmax temperature for context-independent word-probabilities 1 Softmax temperature for BERT 0.25 Softmax temperature for GPT 0.005 Max number of hypothesis 10 Attacked theensuing battls abd airstrikes killed at peast 10 militqnts. Ground-truth the ensuing battle and airstrikes killed at least 10 militants.
BD agn the ensuing battle and air strikes killed at least 10 militants. BD spec the ensuing battle and air strikes killed at least 10 militants. ScRNN Defense tunney battls and airstrikes killed at past 10 militqnts. Pyspellchecker theensuing battle abd airstrips killed at past 10 militants Attacked Ground-truth No, you don't need to have taken classes or earned a degree in your area.
BD agn no, you do ,' nee ,' not besides of never a degree, you are. BD spec no, you do no' need to have taken classes or have a degree in your area. ScRNN Defense , yu so to nerve to era knaet access of need a degreeïn your areȃ.

Pyspellchecker
Attacked A man ix riding ;n s voat.

Ground-truth
A man is riding on a boat.
BD agn a man is riding in a boat. BD spec a man is riding in a boat. ScRNN Defense a man imax riding on s voat. Pyspellchecker a man ix riding in s vote  Table 6: Exact scores for the results shown in Figure 5.