PerturbScore: Connecting Discrete and Continuous Perturbations in NLP

With the rapid development of neural network applications in NLP, model robustness problem is gaining more attention. Different from computer vision, the discrete nature of texts makes it more challenging to explore robustness in NLP. Therefore, in this paper, we aim to connect discrete perturbations with continuous perturbations, therefore we can use such connections as a bridge to help understand discrete perturbations in NLP models. Specifically, we first explore how to connect and measure the correlation between discrete perturbations and continuous perturbations. Then we design a regression task as a PerturbScore to learn the correlation automatically. Through experimental results, we find that we can build a connection between discrete and continuous perturbations and use the proposed PerturbScore to learn such correlation, surpassing previous methods used in discrete perturbation measuring. Further, the proposed PerturbScore can be well generalized to different datasets, perturbation methods, indicating that we can use it as a powerful tool to study model robustness in NLP.


Introduction
Natural language processing (NLP) applications based on neural networks are developing rapidly, exemplified by applications based on pre-trained models (Devlin et al., 2018) such as ChatGPT 2 (Brown et al., 2020), machine translation systems (Bahdanau et al., 2014), question-answering systems (Rajpurkar et al., 2016).While they are growing at an incredible speed, it is of great concern how we can trust these neural networks.Therefore, exploring model robustness in NLP is essential for future NLP developments.Model robustness problems mostly focus on exploring the model behavior when the inputs are perturbed.However, unlike the computer vision field, the discrete nature of natural language makes it more challenging to define, construct and measure the perturbations added to the texts.
Previous works usually separate two lines of work concerning discrete perturbations and continuous perturbations: In the computer vision (CV) field, continuous perturbations are widely explored (Goodfellow et al., 2014) since studying perturbations can help improve model robustness (Madry et al., 2019) and generalization abilities (Hendrycks and Dietterich, 2019;Hendrycks et al., 2021).As for perturbations in NLP, the difference and challenge in discrete perturbations constrain the development of model robustness in NLP.Jin et al. (2019); Zang et al. (2020); Li et al. (2020) craft adversarial examples with synonyms as word-level perturbations, which is hard to measure whether the perturbations are imperceptible.Also, crafting these perturbations would face a combinatorial explosion problem.As studying discrete perturbations is more challenging than continuous perturbations, it is intuitive to wonder: can we find the correlations between discrete and continuous perturbations in NLP and study continuous perturbations instead?
In this paper, we aim to explore connections between discrete and continuous perturbations in NLP models hoping that such connections can help studies of model robustness in NLP.We first give definitions and measuring standards of perturbations in discrete and continuous space and align the form and notations for studying their correlations.Then we make several assumptions to provide the possibility of connecting discrete and continuous perturbations.Specifically, we assume that discrete perturbations and continuous perturbations have similar effects on neural models when being added to the input to perturb the model.Therefore, we are able to search for a continuous perturbation that can be considered as a substitution for the discrete perturbation.We then introduce a method to quantify the correlations between discrete and continuous perturbations and design a regression task to automatically learn the correlation.Specifically, we use the gradientprojection descent method to search for a minimum continuous perturbation that has similar effects on the model behavior with the discrete perturbation.After quantifying the correlations between discrete and continuous perturbations, we use an additional neural network named PerturbScorer to learn such a correlation.That is, given the original input and a discrete perturbation, we use a neural network to predict its continuous perturbation range.
We construct experiments on IMDB and AG's News datasets, which are widely used datasets in NLP robustness studies.We first explore the correlations between discrete and continuous perturbations when they are used to perturb fine-tuned models such as BERT; then we test the performances of the learned network and empirically verify that the correlations between perturbations can be learned through a neural network, providing evidence for researchers to study continuous perturbations in NLP as a substitute for discrete perturbations.
Further, we design extensive analytical experiments and through the experimental results, we make several non-trivial conclusions: (1) we can build a connection between discrete and continuous perturbations; (2) such a connection can be generalized and help improve model robustness and generalization abilities; (3) continuous-space adversarial training is effective because it narrows the gap between discrete and continuous space.
To summarize, in this paper, we explore the correlations between discrete and continuous perturbation, a fundamental challenge in robustness studies in NLP.We provide detailed notations and make assumptions to explore the correlations between perturbations and propose a method to connect these perturbations; further, we design a PerturbScorer to learn such correlation; through experimental results, we show that we can connect discrete perturbations with continuous perturbations.We are hoping that the concept of studying discrete perturbations in NLP through building the connections between continuous perturbations can provide hints for future studies.

Model Robustness and Perturbations
Robustness problems are widely explored in the deep learning field: Goodfellow et al. (2014); Carlini and Wagner (2016) discussed the possibility of crafting gradient-based perturbations as adversarial examples to mislead neural models.Madry et al. (2019) introduces the projected gradient descent method to construct perturbations.Hendrycks and Dietterich (2019); Hendrycks et al. (2021) discussed more general perturbations such as Gaussian noise, blurs, etc. in the distribution shift scenarios.When the perturbations are continuous, studies focus on exploring connections between model robustness and model accuracy (Zhang et al., 2019a;Yang et al., 2020) and plenty of analytical works dive deep into the model robustness studies (Pinot et al., 2019).These robustness studies assume that the perturbations are continuous, therefore, they are not suitable for discrete perturbations and NLP robustness studies.

Perturbations in NLP
In the NLP field, the robustness problem becomes more challenging due to the discrete nature of texts.Ebrahimi et al. (2017) explores crafting character-level and word-level perturbations as adversarial examples to attack NLP models.Follow-up works such as Jin et al. (2019); Zang et al. (2020); Li et al. (2020) aim to find better methods to craft semantic-preserving adversaries.As for more general perturbations, Jia and Liang (2017) explores how adding random sentences can mislead question-answering systems; Yi et al. (2021) explores how adversarial training improves out-of-distribution model generalization problems.Unlike the continuous perturbations explored in the computer vision field, the NLP field rarely discusses how the model behaves in robustness against adversaries and generalization abilities.Zhu et al. (2019) introduces embedding-space gradient-based adversarial training and discovers that continuous space adversaries can help improve NLP model generalization abilities without further explanation.Li et al. (2021) founds that gradient-based adversarial training can be used in defense against word-substitution attacks.In general, robustness studies in NLP rarely focus on finding the correlation between discrete and continuous perturbations which separate works in vision and language fields.

Defining Perturbations
We first define perturbations in deep neural networks for NLP applications: Given The prediction of the input S is denoted as f (X).Here, we use the embedding output X as the model input since we aim to connect the discrete perturbations with continuous perturbations in the embedding space.When the input text is maliciously attacked or perturbed by random noise, the input text becomes S ′ .We use P(S) to denote the perturbation process therefore the perturbed text S ′ = S + P(S).The perturbation function P(S) can be various methods including adversarial attacks and random perturbations.Representative adversarial attack methods are word-substitution adversarial attacks such as HotFlip (Ebrahimi et al., 2017), Textfooler (Jin et al., 2019) and BERT-Attack (Li et al., 2020).Unlike random perturbations such as random deleting or replacing words/characters, adversarial attack methods aim to find the minimum amount of character-or word-level substitutions that can mislead target models.
Unlike in the computer vision field where continuous perturbations can be directly added to the input, the continuous perturbations can only be added to the embedding output X in the language field.For embedding output X ∈ R l * d with sequence length l and hidden size d, we have perturbed output X ′ = X + δ.The continuous perturbation δ ∈ R l * d can be random noise (Hendrycks and Dietterich, 2019) such as Gaussian noise, blurs, pixelate, or adversarial perturbations.A representative method to generate adversarial perturbation δ is the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014).Given a target model f θ (•), the perturbation of sample S is generated based on the gradients: Here, α is a hyper-parameter controlling the perturbation range.

Measuring Perturbations
After defining perturbations, it is important to measure how perturbations affect neural models.
Measuring the severity of the perturbation is a challenge in discrete text perturbations.The similarity between the perturbed and original texts cannot be easily measured.We use A(P(S)) to measure the perturbation intensity of perturbation P(S) added to the original text S, which could be edit-distance, semantic shift, grammar change, etc.For instance, when we use edit-distance as A(P(S)), we assume that the fewer tokens the original text is replaced, the less the text is perturbed.Besides edit-distance, Jin et al. (2019) introduces USE (Cer et al., 2018) to measure the perturbation intensity.We assume that the less semantic information is changed, the less the text is perturbed.In general, finding an accurate measure strategy A(•) is challenging since the standard can be diversified and subjective when measuring discrete perturbations.
On the other hand, continuous perturbations can be measured by constraining the ℓ p -norm ||δ|| p of the perturbations δ.The most common constraint is the ℓ 2 -norm.Compared with measuring discrete perturbation, it is easy and straightforward to measure the continuous perturbation range.

Connecting Perturbations
As illustrated, it is challenging to measure discrete perturbations, which makes it more difficult to explore how perturbations affect the model behavior.
Meanwhile, another challenge is that constructing discrete perturbations is also challenging, for instance, replacing discrete tokens in a multi-token text with multiple candidates for each token is a combinatorial explosion problem.Therefore, since measuring and studying continuous perturbations is more convenient, instead of searching for methods to evaluate the shift caused by discrete perturbations, we aim to build a connection between discrete perturbations and continuous perturbations and explore how the continuous perturbation affects model behaviors instead.We hope that by connecting discrete perturbations to continuous perturbations, we can introduce new perspectives to NLP field model robustness problems.
To build the connection between discrete perturbations and continuous perturbations, we make several assumptions: Assumption 1. Continuous perturbations δ added to X have similar effects on target model f θ compared to discrete perturbations added to S. That is, both types of perturbations can cause a model prediction shift, and stronger perturbations would cause more damage to the model.Assumption 2.
Target model f θ follows Lipschitz constraint: when Here, we assume that in NLP models, such as a fine-tuned BERT, small perturbations in the embedding space do not cause severe damage to model outputs.Otherwise, the behavior change caused by input perturbations is hard to predict, and finding correlations between perturbations is more challenging.
Assumption 3.For a discrete perturbation P(S), there exist a continuous perturbation δ that satisfies: ϵ − ε < ||δ|| 2 < ϵ, here, ε is a small interval.And such δ satisfies: , here, ϕ is a hyper-parameter, and for simplification, f θ (•) takes both discrete tokens S and embedding output X of the discrete tokens S as input, skipping the embedding process.
We assume that the continuous perturbation has a similar effect on model f θ (•), therefore, when the absolute value of the gap between model shift caused by discrete and continuous perturbations is small, we consider they are equal in perturbing neural models.Therefore, we can use continuous perturbations as an approximation of discrete perturbations by building connections between them.

Quantify Connections
After assuming that we can build connections between discrete and continuous perturbations, we aim to quantify such connections.For a discrete perturbation P(S), we find the minimum continuous perturbation δ under Assumption 3. Specifically, we aim to find the proper norm-bound ϵ that under such a norm-bound, there exists a perturbation δ satisfies Assumption 3 mentioned above.Therefore, when S and P(S) is fixed, the goal is to find a norm-bound ϵ: Therefore, we would obtain a data tuple [S, P(S), ϵ], which is the correlation of discrete and continuous perturbations.We are hoping that we can empirically verify that the data tuple can be connected, and verify the assumptions made above.
Algorithm 1 Obtaining norm-bound ϵ Require: Inputs X, S, P(S), label y, search step for t = 0, 1, ...T a do 5: // Get Gradients 7: discard tuple In practice, to obtain ϵ, we use a standard projected-gradient-descent (PGD) (Madry et al., 2019) method.As seen in Algorithm 1, we use multi-step gradient-descent to generate perturbations within the range ϵ, which is the perturbation generation used in gradient-based adversarial training.Specifically, the notation x used in line 7 is to constrain the perturbations within the norm bound ϵ.In line 9, we pick the ϵ that satisfies the assumption that there exists a continuous perturbation that has a similar effect on neural models compared with discrete perturbations.Therefore, once the perturbation δ obtains ideal effects on the neural model (bigger than the effects caused by discrete perturbations), we consider the ϵ found is the proper one.A special case is that if the continuous perturbation || 2 is way bigger than Γ, we simply drop the sample.

PerturbScorer
After constructing the quantification of correlations between discrete and continuous perturbations, we design a PerturbScorer to score the correlations.
The perturbation δ is a continuous variant, therefore, we formulate a regression task to learn the range of ϵ given S and P(S).We train the task as the PerturbScorer M([S, P(S)], ϵ) to measure the correlation between discrete and continuous perturbations in the target model f θ (•).
Considering that the perturbation P(S) should be a small perturbation, we use a simple strategy that concatenates original texts and perturbations in P(S) as the input of the regression task to learn the correlation.We simply concat the perturbations behind the original token, (e.g.: . . ., it would recall [reminds] . . .).Such patterns help the model understand the perturbation of the original texts.Then we use the crafted inputs to predict the norm bounds of the correlated continuous perturbations.

Dataset Construction
To explore the correlations between discrete and continuous perturbations, we use several datasets widely used in exploring model robustness in NLP.We use the IMDB dataset (Maas et al., 2011) and the AG's News dataset (Zhang et al., 2015) which are text classification tasks with an average text length of 220 and 47 accordingly.
We use two widely used perturbation methods P(S), Textfooler (Jin et al., 2019) and randomperturb.In the Textfooler method, we follow the standard generation process and save perturbations in multiple queries regardless of the attack result, which is different from its original usage that keeps finding perturbations until the attack is successful.For each input text S, we have multiple P(S) with different edit-distance differences.
In the random perturb method, we randomly replace a token using a random word from a general vocabulary, which is the vocabulary used to obtain synonyms in the Textfooler method (Mrkšić et al., 2016;Jin et al., 2019).Similar to the Textfooler perturbation method, we also collect multiple perturbations per text including different numbers, places of substitutes, and different substitutes.
In generating continuous perturbations, we use the PGD method to find the minimum continuous perturbation that has a similar model prediction shift compared to the discrete perturbation.Specifically, we set the adversarial step T a = 15 and the adversarial learning rate α = 1e − 1, the norm-ball search interval ε of ϵ is set to 0.01 and the discard parameter ϕ is set to 0.005.
In Figure 1, we draw curves exploring the connection between model output shifts and different levels of perturbations.As shown, we can observe that the average model output shifts caused by discrete and continuous perturbations show consistency with edit distance and norm-ball range.We observe that when the perturbations grow Range of ϵ [0,1e-1] (0.1, 0.2] (0.2, 0.3] (0.3, 0.4] (0.4, 0.5] (0.5, 0.6] (0.6, 1) larger, both discrete and continuous perturbations will cause more damage to neural models.Plus, the models trained by the FreeLB method show better resistance against both discrete and continuous perturbations.These results verify Assumption 1 and show that discrete and continuous perturbation can be correlated.
In Table 2, we count the tuple number of different ϵ ranges in the constructed data tuple of multiple datasets to show the connection between discrete and continuous perturbations.We observe that we only discard a small proportion of collected data tuples, proving that we can successfully find norm ball ϵ that satisfies Assumption 3 that such a continuous perturbation δ is equivalent to the discrete perturbations P(S) in interfering neural models f θ (•).We also observe that as the discrete perturbation P(S) is uniformly distributed in the edit-distance range from 1 to 30 in the IMDb Dataset and 1 to 15 in the AG's News Dataset, the continuous perturbations mostly fall in the range that ϵ < 2e-1, indicating that most discrete perturbations with different edit-distances (indicating different numbers of substitutions) only compares to a minimum continuous perturbation.Therefore, learning the correlation between these discrete and continuous perturbations can help understand the discrete perturbations.Further, compared with the random perturbation, the Textfooler method generates more discrete perturbations that have more damage to model predictions and the corresponding continuous perturbations require larger norm balls, indicating that stronger discrete perturbations equal to larger continuous perturbations, providing the possibility to connect discrete perturbations with continuous perturbations.
For the collected data, we select 80% data tuples as the training set and 20 % as the test set in training and testing the PerturbScorer.

Evaluating Quantification of Correlation
After constructing the discrete perturbations and finding the corresponding ϵ of these discrete perturbations, we are able to explore whether the discrete and continuous perturbations can be connected and show similar effects on neural networks.To evaluate the quantification process of correlations illustrated in Sec.3.4, we use Kendall and Pearson correlation coefficient index to measure whether the discrete perturbations and the continuous perturbations can be connected.
The goal is to measure the correlation coefficient index such as Kendall and Pearson index between A(P(S)) and the selected norm-bound ϵ.If the correlation between A(P(S)) and ϵ is large, we can verify Assumption 3 that assumes there exists a continuous perturbation that equals to the discrete perturbation in interfering neural models.We use several simple A(•) including edit-distances, BERTScore (Zhang et al., 2019b) and USE (Cer et al., 2018).Here, BERTScore and USE measure the similarity between two sentences, which is reversed compared with edit-distance and perturbation scorer, therefore, we use the opposite number of the BERTScore and USE score as A(•) to measure the correlation coefficient index.
Further, we can directly measure the correlation coefficient index between the predicted and the found ϵ, exploring whether the PerturbScorer can learn the connection between perturbations, which supports the assumptions we made above in Sec.3.3 and provides a powerful tool to quantify the discrete perturbations for robustness studies in NLP.

PerturbScorer Training
The training process of the PerturbScorer follows the standard fine-tuning process used in fine-tuning regression tasks such as the STS-B (Cer et al., 2017) dataset using huggingface Transformers Table 3: PerturbScorer evaluation results and correlation comparison with evaluators including BERTScore, USE, and Edit-Distance.(Wolf et al., 2019).We set the learning rate to 5e-5 with batch size set to 64 and 128 for IMDB and AG's News datasets and use 4xNvidia 3090 GPUs to run the PerturbScorer training process.

Correlation Quantification Results
In Table 3, we list the correlation coefficient index of different measuring methods of perturbations and the PerturbScorer learned correlation of the perturbations: We can observe that when we use scorers A(•) to measure the discrete perturbation, the correlation quantification results between the A(•) and the obtained continuous perturbation bound is not significant.In different setups including different datasets and perturbation methods, the Kendall correlation is smaller than 0.6 and the Spearman correlation is smaller than 0.8, indicating that the discrete perturbation measuring methods do not have close correlations with the continuous perturbations that have a similar effect to neural models, further proving that these measuring methods cannot properly measure the damage to neural models.
On the other hand, we can observe that when we use the PerturbScorer M(•) to predict the corresponding continuous perturbations, the correlation scores are significant enough to prove that the PerturbScorer can learn the connection between the discrete perturbations and the continuous perturbations.Such results show that we can use our proposed PerturbScorer as a powerful tool to build a connection between the discrete and continuous perturbations.

Analysis
As we first make assumptions about the correlations between discrete and continuous perturbations, we construct the data tuples and design a PerturbScorer to explore whether the correlation can be learned and generalized by neural networks.By exploring the correlations, we can obtain non-trivial observations that can be helpful in model robustness in NLP:

Lipschitz Constraint Tightness
As we observe in Figure 1, the model shift is in direct proportion to the perturbation range, indicating that the target model f θ follows a Lipschitz constraint on a general scale.Further, as seen in Table 3, compared with the model trained by the FreeLB method, it is more challenging to study the correlation of the normal fine-tuned BERT as f θ (•), indicating that gradient-based adversarial training helps build a tighter connection between discrete and continuous perturbations, providing a perspective to explain why gradientbased adversarial training helps in improving robustness performances and generalization performances in NLP tasks with discrete inputs (Zhu et al., 2019;Li et al., 2021).

PerturbScorer Generalization
In Table 3, we show that the correlation between discrete perturbation P(S) and continuous perturbations range ϵ of model f θ (•) can be learned by a PerturbScorer M(•), further, we aim to explore whether building such a correlation can be applied to various scenarios in robustness studies in NLP.That is, we explore the generalization ability of PerturbScorer M(•).We explore whether the learned PerturbScorer M(•) based on target model f θ (•) can be generalized to cross-dataset, cross-perturbation method P(S), cross-model f θ (•), therefore, the application of the PerturbScorer and the concept of learning the correlation between discrete and continuous perturbations can be used in various scenarios.We list thorough results in the Appendix.We can explore how PerturbScorer M(•) performs on different datasets or faces different types of perturbations: Cross-Perturbation PerturbScorer In crossperturbation tests, we observe that when we train the PerturbScorer with random perturbations as P(S), the PerturbScorer can learn perturbations generated by textfooler, while textfooler-generated perturbations cannot be well generalized.Such results show that we can collect multiple types of perturbations to train a PerturbScorer that can be generalized to recognize various discrete perturbations as a powerful tool to quantify how the discrete perturbations affect neural models.

Cross-Dataset PerturbScorer
In cross-dataset tests, we observe that when we test the AG's News data tuples using the PerturbScorer trained with the IMDB dataset, the correlation is weakened but still stronger than correlations with edit distance, indicating that the PerturbScorer we train can be generalized to different datasets, showing that the connection between discrete and continuous perturbations is strong in general NLP systems, which provides possibilities of using such correlations in various NLP robustness scenarios.

Cross-Model PerturbScorer
In general, the correlation between discrete and continuous perturbations is dependent on the neural model f θ (•) since the perturbation range ϵ is calculated based on a certain model f θ (•).
However, when we test the generalization ability between different neural models f θ (•), we observe that the correlation is still close.Therefore, it is possible to build a more general PerturbScorer as a general metric to score the discrete perturbations.
Combined Scorer We further build a combined PerturbScorer that is trained by a mixture of data tuples including different datasets, perturbations methods and neural models to explore a more generalized scenario.
As seen in Table 4, when we train a model using mixed data collected, we can build a general PerturbScorer that can successfully predict the correlations between discrete perturbations and the corresponding continuous perturbation ranges.Such a result shows that it is possible to build a general PerturbScorer that can be used in solving different datasets and perturbation types, showing that we can use continuous perturbations as a proxy for discrete perturbations when studying NLP robustness problems.

Conclusion and Future Directions
In this paper, we focus on a fundamental problem in robustness studies in NLP, which is the discrete nature of texts.The discrete nature isolates NLP robustness studies from well-studied machine learning fields, therefore, we introduce the concept of building connections between discrete and continuous perturbations as a new perspective to explore NLP robustness.We build a PerturbScorer to learn the correlation between discrete and continuous perturbations and find that such a PerturbScorer can learn the connection between perturbations, allowing us to use continuous perturbation ranges as a proxy constraint of discrete perturbations, which avoids the challenge that discrete perturbations are hard to measure.Further, we find that our proposed PerturbScorer can be generalized to different datasets and perturbation methods, indicating that such a process can be further applied in the future in NLP robustness studies.For future directions, we aim to explore more effective methods to build a stronger PerturbScorer and to explore more broad scenarios to utilize the proposed PerturbScorer.

Limitations
In this work, we explore the discrete perturbation in robustness studies in NLP.We aim to find correlations between discrete perturbations and continuous perturbations since continuous perturbations are easily measured and well-studies in the computer vision field.Our work makes assumptions that discrete perturbations show similar effect to neural networks compared with continuous perturbations, therefore, one limitation of such assumptions is that similar effect does strictly make two types of perturbations equal in nature.We find one perspective to connect the discrete perturbations and continuous perturbations, which is not the only solution.Future works can explore more strict constraints and find stronger connections between discrete and continuous perturbations.
Also, better PerturbScorer designing and the applications based on correlations between discrete and perturbations and PerturbScorers can be further explored.We focus on defining and building the connection between discrete and continuous perturbations, and we do not explore further applications based on these connections and our proposed PerturbScorer.For instance, as the PerturbScorer can be used in scoring the discrete perturbations, it can be used in recognizing differences between sentences or measuring distribution shifts.Also, previous works explore robustness and generalization trade-offs and explainable robustness theories on continuous space, mostly in the computer vision field, our works reveal the potential to explore these problems in NLP, which can be explored in future works.
Further, as large language models (LLMs) are drawing much attention in the NLP community, how strong LLMs behave in connecting discrete and continuous space perturbations remains unexplored, especially when GPT-4 (OpenAI, 2023) is known to support images and texts.As these models are not open-source to the public, we leave exploring the perturbation in LLMs in future works.

Table 1 :
Selected samples with same edit-distance perturbations showing discrete perturbation constructions and measurements of discrete perturbations.The model output shifts are different from edit-distance or BERTScore.

Table 2 :
Statistics of the pair number of constructed correlations between discrete and continuous perturbations.

Table 4 :
PerturbScorer generalization tests on cross dataset, perturbation type and model.We also test a combined PerturbScorer trained with multi-perturbations, multi-datasets and multi-model generated data tuples.