RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training

,


Introduction
Recent research has demonstrated the state-of-theart performance of Pre-trained Language Models (PLMs) in learning contextual word embeddings (Devlin et al., 2019), leading to improved generalization in various Natural Language Processing (NLP) tasks (Yang et al., 2019;He et al., 2021;Ding et al., 2023).The focus of PLMs has extended to acquiring universal sentence embeddings, such as Universal Sentence Encoder (USE) (Cer et al., 2018) and Sentence-BERT (Reimers and Gurevych, 2019), which effectively capture the semantic representation of the input text.This representation learning facilitates feature generation for classification tasks and enhances large-scale semantic search (Neelakantan et al., 2022).
The assessment of PLM-based sentence representation relies on two crucial characteristics: generalization and robustness.While considerable research efforts have been dedicated to developing universal sentence embeddings using PLMs (Reimers and Gurevych, 2019;Zhang et al., 2020;Ni et al., 2022;Neelakantan et al., 2022;Wang et al., 2023;Bölücü et al., 2023), it is worth noting that despite their promising performance across various downstream classification tasks (Sun et al., 2019;Gao et al., 2021), demonstrating proficiency in generalization, these representations exhibit limitations in terms of robustness in adversarial settings and are vulnerable to diverse adversarial attacks (Nie et al., 2020;Wang et al., 2021).Existing research (Garg and Ramakrishnan, 2020;Wu et al., 2023;Hauser et al., 2023) highlights the poor robustness of these representations, such as BERTbased representations, which can be deceived by replacing a few words in the input sentence.
In this paper, we propose RobustEmbed, a robust sentence embedding framework that takes both of these essential characteristics into account.The core concept involves introducing a small adversarial perturbation to the input text and employing the contrastive objective (Chen et al., 2020) to learn high-quality sentence embeddings.RobustEmbed perturbs the embedding space rather than the raw text, which exhibits a positive correlation with generalization and promotes higher invariance.Our framework utilizes the original embedding along with the perturbed embedding as "positive pairs," while other sentence embeddings in the same minibatch serve as "negatives."The contrastive objective identifies the positive pairs among the negatives.By incorporating norm-bounded adversarial perturbation and contrastive objectives, our method enhances the robustness of similar sentences and disperses sentences with different semantics.This straightforward and efficient approach yields superior sentence embeddings in terms of both generalization and robustness benchmarks.
We conduct extensive experiments on a wide range of text representation and NLP tasks to verify the effectiveness of RobustEmbed including semantic textual similarity (STS) tasks (Conneau and Kiela, 2018), transfer tasks (Conneau andKiela, 2018), andTextAttack (Morris et al., 2020).Two first series of experiments evaluate the quality of sentence embeddings on semantic similarity and natural language understanding tasks, while the last series assess the robustness of the framework against state-of-the-art adversarial attacks.RobustEmbed demonstrates significant improvements in robustness, reducing the attack success rate from 75.51% to 39.62% for the BERTAttack attack technique and achieving similar improvements against other adversarial attacks.Additionally, the framework achieves performance improvements of 1.20% and 0.40% on STS tasks and NLP transfer tasks, respectively, when employing the BERT base encoder.
Contributions.Our main contributions in this paper are summarized as follows: • We introduce RobustEmbed, a novel selfsupervised framework for sentence embeddings that generates robust representations capable of withstanding various adversarial attacks.Existing sentence embeddings are susceptible to such attacks, highlighting a vulnerability in their security.RobustEmbed fills this gap by employing high-risk perturbations within a novel contrastive learning approach.
• We conduct extensive experiments to demonstrate the efficacy of RobustEmbed across various text representation tasks and against state-of-the-art adversarial attacks.Empirical results confirm the high efficiency of our framework in terms of both generalization and robustness benchmarks.
• To facilitate further research in this important area, our source code is available in the Ro-bustEmbed Repository

Related Work
The early work in text representations focused on applying the distributional hypothesis to predict words based on their context (Mikolov et al., 2013b,a).There are extensive studies on learning universal sentence embeddings using supervised and unsupervised approaches, such as Doc2vec (Le andMikolov, 2014), SkipThought (Zhu et al., 2015), Universal Sentence Encoder (Cer et al., 2018), and Sentence-BERT (Reimers and Gurevych, 2019) In comparison to several existing contrastive adversarial learning approaches in the text representation area (Yan et al., 2021;Meng et al., 2022;Qiu et al., 2021;Li et al., 2023;Rima et al., 2022;Pan et al., 2022), our framework stands out by generating more efficient high-risk iterative perturbations in the embedding space.Furthermore, our framework leverages a more powerful contrastive objective approach, leading to high-quality text representations that demonstrate enhanced generalization and robustness properties.Empirical results substantiate the superiority of our approach across various generalization and robustness benchmarks.

Background
In this section, we present an overview of the recent progress in adversarial perturbation generation and self-supervised contrastive learning.

Adversarial Perturbation Generation
Adversarial perturbation involves adding maliciously crafted perturbations to benign data, which can deceive Machine Learning (ML) models, including deep learning methods (Goodfellow et al., 2015).These perturbations are designed to be imperceptible to humans but can cause the model to make incorrect predictions (Metzen et al., 2017).Adversarial training, which involves incorporating adversarial perturbations during the model training process, has been shown to enhance the model's robustness against adversarial attacks (Madry et al., 2018;Shafahi et al., 2020;Xu et al., 2020;Wang et al., 2019b).While various perturbation generation techniques have contributed to machine vision (Chakraborty et al., 2021), the progress of these techniques in the NLP domain has been at a slower pace due to the discrete nature of text (Jin et al., 2020).In recent years, instead of directly applying adversarial perturbations to raw text, a few studies have focused on perturbing the embedding space (Wang et al., 2019a;Dong et al., 2021).However, these methods still face challenges in terms of generalization, as they may not be applicable to any ML model and NLP tasks.Utilized within our framework, a more generalized approach for generating high-risk adversarial perturbations involves applying a small noise δ within a norm ball to the embedding space, aiming to maximize the adversarial loss: where f θ (.) denotes an ML model parameterized with X as the sub-word embeddings, and y is the truth label.Various gradient-based algorithms have been proposed to address this optimization problem.We employ a practical combination of the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015) and the Projected Gradient Descent (PGD) technique (Madry et al., 2018) to generate adversarial perturbations that represent worst-case examples.

Contrastive Learning Based Representation
The objective of contrastive learning is to acquire effective low-dimensional representations by bringing semantically similar samples closer and pushing dissimilar ones further apart (Hadsell et al., 2006).Self-supervised contrastive learning has demonstrated promising results in data representation across domains such as machine vision (Chen et al., 2020), natural language processing (Gao et al., 2021;Neelakantan et al., 2022), and speech recognition (Lodagala et al., 2023).Our framework adopts the contrastive learning concept proposed by Chen et al. (2020) to generate high-quality representations.Let {(x i , x + i )} N i=1 denote a set of N positive pairs, where x i and x + i are semantically correlated and (z i , z + i ) represents the corresponding embedding vectors for the positive pair (x i , x + i ).We define z i 's positive set as {x pos i } = z + i , while the negative set {x neg i } as the set of other positive pairs.Then, the contrastive training objective can be defined as follows: − log( where τ denotes a temperature hyperparameter and sim(u, v) = u ⊤ v ∥u∥.∥v∥ is the cosine similarity between two representation vectors.The standard objective function only contains a single sample in the positive set.The total loss is computed over all positive pairs within a mini-batch.

The Proposed Adversarial
Self-supervised Contrastive Learning We introduce RobustEmbed, a simple yet effective approach for generating universal text representations through adversarial training of a selfsupervised contrastive learning model.Given a PLM f θ (.) as the encoder and a large unsupervised dataset D, RobustEmbed aims to pre-train f θ (.) on D to enhance the efficiency of sentence embeddings across diverse NLP tasks (improved generalization) and increase resilience against various adversarial attacks (enhanced robustness).Algorithm 1 demonstrates our framework's approach to generating a norm-bounded perturbation using an iterative process, confusing the f θ (.) model by treating the perturbed embeddings as different instances.Our framework then employs a contrastive learning approach to maximize the similarity between the embedding of an input instance and the adversarial embedding of its positive pair.Moreover, Figure 1 provides an overview of our RobustEmbed framework, which aims to achieve adversarial robustness in representations.The framework involves an iterative collaboration between the perturbation generator and the f θ (.) model to generate high-risk perturbations for adversarial contrastive learning during the final training step.The subsequent sections delve into the main components of our framework and provide a detailed analysis of the training objective.

Perturbation Generation
As the primary step, RobustEmbed aims to generate small perturbations that fool the ML model, leading to incorrect predictions, while remaining nearly imperceptible to humans.The framework uses an approach based on combination of the PGD and FGSM algorithms to generate a perturbation that maximizes the self-supervised contrastive loss, facilitating discrimination between various instances.RobustEmbed employs multiple iterations of this combination, specifically T-step FGSM and K-step PGD, to meticulously reinforce invariance within the embedding space, ultimately resulting in enhanced generalization and robustness.
In particular, considering the PLM-based encoder f θ (.) and an input sentence x, RobustEmbed passes the sentence to the f θ (.) model twice: by , perturbation δ, dropout masks m1 and m2, perturbation bound ϵ, step sizes α and β, learning rate η, perturbation modulator λ, regularization parameter γ, perturbation generation iterators K and T , contrastive learning objective L con,θ (eq.2) end end applying the standard dropout twice, two different embeddings of (X, X + ) are obtained as "positive pairs" (Gao et al., 2021).The framework takes the following steps to update the perturbation separately for the PGD and FGSM in iteration k + 1 and t + 1 respectively: where g(δ n ) = ∇ δ L con,θ (X +δ n , {X + }) with n = t or k is the gradient of the contrastive learning loss with respect to δ.The perturbation is generated by the ℓ ∞ norm-ball around the input embedding with radius ϵ, and Π projects the perturbation onto the ϵ-ball.Further, α and β are the step sizes of the attacks and sign(.)returns the sign of the vector.Essentially, T-step FGSM and Kstep PGD are mathematically equivalent when P is either 2 or ∞.Their primary distinctions lie in the number of iterations (i.e., T and K) and the step size of the attack (i.e., α and β ) used to modify the input perturbation, ultimately generating a unique high-level perturbation.The final perturbation can be obtained through the combination of T-step FGSM and K-step PGD: where 0 ≤ λ ≤ 1 modulates the relative significance of each separate perturbation in the generation of the final perturbation.

Robust Contrastive Learning
To achieve robust representation through selfsupervised contrastive learning, adversarial learning objective, which follows a min-max formulation to minimize the maximum risk for any perturbation δ (Madry et al., 2018), could be defined as follows: where X + δ is the adversarial embedding generated by the iterative gradient-based perturbation generation (eq.5).Our framework utilizes adversarial examples generated in the embedding space, rather than using the original raw text, resulting in an ultimate pre-trained model that is robust against m-way instance-wise adversarial attacks.The framework employs the contrastive learning objective to maximize the similarity between clean examples and their adversarial perturbation by incorporating the adversarial example as the additional element in the positive set: L RobustEmbed, θ := L con,θ (x, {x pos , x adv }), (7) L total := L RobustEmbed, θ + γL con,θ (x adv , {x pos }), (8) where x adv represents the adversarial perturbation of the input sample x in the embedding space, and γ denotes a regularization parameter.The first part of the total contrastive loss (eq.8) aims to optimize the similarity between the input sample x, its positive pair, and its adversarial perturbation, while the second part serves to regularize the loss by encouraging the convergence of the adversarial perturbation and the positive pair of x.

Evaluation and Experimental Results
This section presents a comprehensive set of experiments aimed at validating the effectiveness of our proposed framework in terms of generalization and robustness metrics.In the first two series of experiments, we investigate the performance of our framework on seven semantic textual similarity (STS) tasks and six transfer tasks within the SentEval framework1 to assess the generalization capability of our framework in generating efficient sentence embeddings.In the final series of experiments, we measure the resilience of the embeddings against five state-of-the-art adversarial attacks to assess the robustness capability of our framework in generating robust text representation.Appendices A and B provide training details and ablation studies that illustrate the effects of hyperparameter tuning.

Transfer Tasks
This experiment leverages transfer tasks to evaluate the performance of our framework, RobustEmbed, on diverse text classification tasks, including sentiment analysis and paraphrase identification.Our assessment encompasses six transfer tasks: CR (Hu and Liu, 2004), SUBJ (Pang and Lee, 2004), MPQA (Wiebe et al., 2005), SST2 (Socher et al., 2013), and MRPC (Dolan and Brockett, 2005), with detailed information provided in Appendix E. We adhere to the standard methodology described in Conneau and Kiela (2018) and train a logistic regression classifier on top of the fixed sentence embeddings for our experimental procedure.We replicated the SimCSE, ConSERT, and USCAL frameworks using our configuration for both BERT and RoBERTa encoders.The results presented in Table 2 indicate that our framework demonstrates superior performance in terms of average accuracy when compared to other sentence embedding methods.Specifically, when utilizing the BERT encoder, our framework outperforms the second-best embedding method by a margin of 0.40%.Moreover, Ro-bustEmbed achieves the highest score in four out of six text classification tasks.The similar interpretation of the BERT encoder are also maintained for the RoBERTa encoder, including both the base version and the large version.

Adversarial Attacks
In this section, we evaluate the robustness of our sentence embedding framework against various adversarial attacks, comparing it with two stateof-the-art sentence embedding models: SimSCE (Gao et al., 2021) and USCAL (Miao et al., 2021).
Our evaluation involves fine-tuning a BERT-based PLM using different embedding approaches on seven text classification and natural language inference tasks, namely MRPC (   (Williams et al., 2018).Detailed information regarding these tasks can be found in Appendix E. To assess the robustness of the fine-tuned models, we perform adversarial attacks using the TextAttack framework (Morris et al., 2020) to investigate the impact of five efficient adversarial attack techniques: TextBugger (Li et al., 2019), PWWS (Ren et al., 2019), TextFooler (Jin et al., 2020), BAE (Garg and Ramakrishnan, 2020), and BERTAttack (Li et al., 2020b).To acquire a more comprehensive insight into the functionality of these attacks, we provide more details in Appendix F. It should be noted that adaptive attacks cannot generate adversarial attacks using the main algorithm of our framework, as it operates exclusively in the embedding space while the input instances of sentence embeddings are raw text.To ensure statistical validity, each experiment was conducted five times, each time using 1000 adversarial attack samples; the reported results shown in this section are the average results of five iterations.Table 3 presents the attack success rates of five adversarial attack techniques on three sentence embeddings, including our framework.Our embedding framework consistently outperforms the other two embedding methods, demonstrating significantly lower attack success rates across all text classification and natural language inference tasks.Consequently, RobustEmbed achieves the lowest average attack success rate against all adversarial attack techniques.These findings validate the robustness of our embedding framework and highlight the vulnerabilities of the two state-of-the-art sentence embeddings to various adversarial attacks.
Figure 2 depicts the average number of queries required and the resulting accuracy reduction for a set of 1000 attacks on two fine-tuned sentence embeddings.Green data points represent attacks on the RobustEmbed framework, while red points represent attacks on the USCAL approach (Miao et al., 2021).Connected pairs of points are associated with specific attack techniques.Ideally, a robust sentence embedding should be situated in the top-left region of the diagram, indicating that the attack technique necessitates a larger number of queries to deceive the target model while causing minimal performance degradation.The figure illustrates that, for each attack, RobustEmbed exhibits greater stability compared to the USCAL method.In other words, a larger number of queries is required for RobustEmbed, resulting in a lower accuracy reduction (i.e., better performance) compared to USCAL.This observation holds true for all applied adversarial attacks, indicating the robustness of our framework.

Robust Embeddings
We introduce a new task called Adversarial Semantic Textual Similarity (AdvSTS) to evaluate the resilience of sentence embeddings within our representation framework.AdvSTS uses an efficient adversarial approach, such as TextFooler, to manipulate a pair of input sentences in a way that encourages the target model to produce a regression score that deviates as much as possible from the true score (the ground truth label).Consequently, we create an adversarial STS dataset by converting all benign instances from the original dataset into adversarial examples.Similar to the STS task, AdvSTS employs Pearson's correlation metric to assess the correlation between the predicted similarity scores generated by the target model and the human-annotated similarity scores for the adversar- ial dataset.
Table 4 illustrates the attack success rates of five different adversarial attack techniques (namely TextFooler, TextBugger, PWWS, BAE, and BERTAttack) applied to three sentence embeddings, including our framework.These evaluations are carried out for two specific AdvSTS tasks, namely AdvSTS-B and AdvSICK-R.Notably, our embedding framework consistently outperforms the other two embedding methods, showing significantly lower attack success rates across both AdvSTS tasks and all employed adversarial attack techniques.
In conclusion, the extensive experiments conducted and the results presented in Tables 1, 2, 3, and 4, as well as Figure 2, provide strong evidence of the exceptional performance of RobustEmbed in various text representation and classification tasks, as well as its resilience against various adversarial attacks and tasks.These findings support the notion that our framework possesses remarkable generalization and robustness capabilities, underscoring its potential as an efficient and versatile approach for generating high-quality sentence embeddings.

Distribution of Sentence Embeddings
We followed the methodology proposed by Wang and Isola (2020) to employ two critical evaluation metrics, termed alignment and uniformity, to assess the quality of our representations.In the context of positive pairs represented by the distribution p pos , alignment calculates the anticipated distance

Model
AdvSTS-B AdvSICK-R Avg.  between the embeddings of paired instances: Uniformity quantifies how uniformly the embeddings are distributed within the representation space: where p data represents the data distribution.The underlying principle of these metrics is that positive instances should remain closely grouped, while embeddings for random instances should be spread across the hypersphere.Figure 3 illustrates the uniformity and alignment of various sentence embedding models, where lower values correspond to improved performance.In comparison to alternative representations, RobustEmbed achieves a similar level of uniformity (-2.293 vs. -2.305)but demonstrates superior alignment (0.058 vs. 0.073).This highlights the greater efficiency of our framework in optimizing the representation space in two distinct directions.

Conclusion and Future Work
In this paper, we proposed RobustEmbed, a selfsupervised sentence embedding framework that significantly enhances robustness against various adversarial attacks while achieving state-of-the-art

Limitations
Despite the ingenuity of our methodology and its impressive performance, our framework does have some potential limitations: • Our framework is primarily designed and optimized for descriptive models, such as BERT, which excel in understanding and representing language, as well as related tasks like text classification.However, it may not be directly applicable to generative models like GPT, which prioritize generating coherent and contextually relevant text.Therefore, there may be limitations in applying our methodology to enhance the generalization and robustness characteristics of generative pre-trained models.
• Our framework requires significant GPU resources for pre-training large-scale pre-trained models like RoBERTa large .Due to limitations in GPU availability, we had to utilize smaller batch sizes during pre-training.While larger batch sizes (e.g., 256 or 512) generally lead to improved performance metrics, our experiments had to compromise and use smaller batch sizes to generate sentence embeddings efficiently given the GPU resource constraints.

A Training Details
In our experimental setup, we initialize our sentence encoder, denoted as f θ , using the checkpoints obtained from BERT (Devlin et al., 2019) and RoBERTa (Liu et al., 2019).For sentence embedding, RobustEmbed utilizes the representation of the [CLS] token as the starting point and incorporates a pooler layer on top of the [CLS] representations to facilitate contrastive learning objectives.
The training process of RobustEmbed involves 2 epochs, with model evaluation conducted every 250 training steps.The best checkpoint, determined by the highest average STS (Semantic Textual Similarity) score, is selected for final evaluation.To train the model, we utilize a dataset consisting of 10 6 randomly sampled sentences from English Wikipedia, as provided by the SimCSE framework (Gao et al., 2021).The average training time for RobustEmbed is 2-4 hours.As our framework is initialized with pre-trained checkpoints, it exhibits robustness that is not sensitive to batch sizes, thus enabling us to employ batch sizes of either 64 or 128.In terms of transfer tasks, we determine the best hyperparameters based on the averaged score obtained from the development sets of six transfer tasks.

B Ablation Studies
In this section, we analyze the influence of four key hyperparameters in our approach on the overall performance.We utilize BERT base as the encoder and evaluate the hyperparameters using the development set of STS tasks.The Avg. accuracy on STS tasks

B.1 Step Sizes in Perturbation Generation
As depicted in Algorithm 1, the RobustEmbed framework incorporates two step sizes, denoted as α and β, to perform iterative updates during the PGD and FGSM perturbation generation processes, respectively.Figure 4 illustrates the collaborative effect of varying ranges for these two step sizes in generating high-risk perturbations, which is significant for achieving efficient contrastive learning objective.The results indicate greater improvement when β is adjusted in a lower range while α is placed in an upper range.Specifically, better performance is observed when α and β are assigned ranges of [1e-4, 1e-6] and [1e-2, 1e-4], respectively.Therefore, we utilize α = 1e-5 and β = 1e-3 for our experiments as it achieves the best results among the different arrangements.

B.2 Step Numbers in Perturbation Generation
RobustEmbed applies T-step FGSM and K-step PGD iterations to obtain high-risk adversarial perturbations for the contrastive learning objective.To simplify the analysis of perturbation generation iterations, we set K = T. Figure 5 demonstrates the impact of different step numbers (N = K or T) on effectiveness.We observe a gradual improvement as N increases from 1 to 9; however, beyond N=9, the improvement becomes negligible.Moreover, a higher N leads to longer running-time and unfair resource allocation.Hence, we select N=5 for our experiments.

B.3 Norm Constraint
To ensure the imperceptibility of the generated adversarial examples, the magnitude of the perturbation denoted as δ, is controlled in Ro-bustEmbed.Three commonly used norm functions, namely L 1 , L 2 , and L ∞ , are employed to restrict the magnitude of to small values.

B.4 Modulation Factor
RobustEmbed incorporates a modulation factor, denoted as 0 ≤ λ ≤ 1, to adjust the relative significance of each separate perturbation (PGD and FGSM) in the formation of the final perturbation.
The performance efficiency of various values for this modulation factor on semantic textual similarity tasks is presented in

D Contrastive Learning Loss
The first part of the total contrastive loss (Equation 8) optimizes the similarity between the input instance x and its positive pair (x pos ), along with the similarity between x and its adversarial perturbation (x adv ).Although it indirectly brings x pos and x adv closer, our observations show that regularizing the main objective function (Equation 7) through direct contrastive learning between x pos and x adv (the second part of Equation 8) helps us achieve improved clean accuracy and robustness.

E Text Classification Tasks
This section presents additional information on the text classification tasks used to assess the generalization and robustness capabilities of our framework in comparison to various sentence embedding methods.The MR (Movie Reviews) dataset (Pang and Lee, 2005b) consists of sentence-level samples with sentiment polarity, comprising 8,530 training and 1,066 testing highly polar instances.The CR dataset (Hu and Liu, 2004) is a customer review dataset collected in three steps: extracting products with customer comments, identifying opinion sentences, and labeling each sentence as positive or negative.The SUBJ dataset (Pang and Lee, 2004) contains 5,000 subjective and 5,000 objective sentences from movie reviews, labeled based on subjectivity status and polarity.The MPQA dataset (Wiebe et al., 2005) includes annotated documents from diverse news sources, categorizing opinion states such as beliefs, emotions, sentiments, and speculations.The SST2 dataset (Socher et al., 2013) is a sentence-level dataset with 8,544 training and 2,210 testing highly polar samples, extracted from movie reviews and classified as negative or positive.The MRPC dataset (Dolan and Brockett, 2005)

F Adversarial Attack Methods
This section presents additional details on the diverse adversarial attack techniques employed to assess the robustness of our sentence embedding framework.The TextBugger method (Li et al., 2019) identifies important words using the Jacobian matrix of the target model and selects an optimal perturbation from five types of generated perturbations.The PWWS method (Ren et al., 2019) utilizes a synonym-swap technique based on a combination of word saliency scores and maximum word-swap effectiveness.TextFooler (Jin et al., 2020) identifies important words, gathers synonyms, and replaces each important word with the most semantically similar and grammatically correct synonym.The BAE method (Garg and Ramakrishnan, 2020) employs four adversarial attack strategies involving word replacement or/and word insertion operations, where a portion of the text is masked and BERT MLM is used to generate substitutions.The BERTAttack method (Li et al., 2020b) consists of two steps: (a) searching for vulnerable words/sub-words and (b) using BERT MLM to generate semantic-preserving substitutes for the vulnerable tokens.

Figure 1 :
Figure 1: The general architecture of the RobustEmbed framework.In contrastive learning step, a blue arrow indicate gathering positive pairs together, and a red arrow refers to keeping distance among negative pairs

Figure 2 :
Figure2: Average number of queries and the resulting accuracy reduction for a set of 1000 attacks on two fine-tuned sentence embeddings.Green points represent attacks on the RobustEmbed framework, while red points represent attacks on the USCAL approach.

Figure 3 :
Figure 3: ℓ align − ℓ uniform plot of models based on BERT base

Figure 4 :
Figure 4: The impact of step sizes in perturbation generation on the average performance of STS tasks.

Figure 5 :
Figure 5: The effect of the step number (denoted as N = K or T) in the T-step FGSM and K-step PGD methods on the averaged correlation of the different Semantic Textual Similarity (STS) tasks.

Table 1 :
(Gao et al., 2021)ych, 2019)f the SimCSE, ConSERT, and USCAL frameworks Semantic Similarity performance on STS tasks (Spearman's correlation, "all" setting) for sentence embedding models.We emphasize the top-performing numbers among models that share the same pre-trained encoder.♡:resultsfrom(ReimersandGurevych, 2019); ♣: results from(Gao et al., 2021); All remaining results have been reproduced and reevaluated by our team.The ⋆ symbol shows our framework.

Table 2 :
(Reimers and Gurevych, 2019)of transfer tasks for different sentence embedding models.♣:resultsfrom(ReimersandGurevych, 2019); ♡: results from(Zhang et al., 2020); We emphasize the top-performing numbers among models that share the same pre-trained encoder.All remaining results have been reproduced and reevaluated by our team.The ⋆ symbol shows our framework.

Table 3 :
Attack success rates of various adversarial attacks applied to three sentence embeddings (SimCSE-BERT, USCAL-BERT, and RobustEmbed-BERT) across five text classification and two natural language inference tasks.
Yan Zhang, Ruidan He, Zuozhu Liu, Kwan Hui Lim, and Lidong Bing.2020.An unsupervised sentence embedding method by mutual information maximization.In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 1601-1610.Association for Computational Linguistics.Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, and Jingjing Liu.2020.Freelb: Enhanced adversarial training for natural language understanding.In International Conference on Learning Representations.Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler.2015.Aligning books and movies: Towards story-like visual explanations by watching movies and reading books.In Proceedings of the IEEE international conference on computer vision, pages 19-27.

Table 5 :
The influence of the norm constraint on perturbation generation on the average performance of various Semantic Textual Similarity (STS) tasks.

Table 6 .
The results indicate that λ = 0.5 achieves the highest averaged correlation among the tested magnitudes, indicat-ing its effectiveness in generating more powerful perturbations.Therefore, we adopt this setting in the configuration of our framework.

Table 6 :
The impact of the modulation factor on the average performance of different Semantic Textual Similarity (STS) tasks in generating the final perturbation.

Table 8
(Young et al., 2014)18) pairs from news articles, labeled by human annotators to indicate semantic equivalence relationships.The YELP Polarity Review (YELP) dataset(Zhang et al., 2015)consists of document-level samples, with 560,000 training and 38,000 testing highly polar instances classified as negative (1-and 2-star) or positive (4and 5-star) reviews.The Internet Movie Database (IMDb) Review dataset(Maas et al., 2011)contains 25,000 training and 25,000 testing highly polar samples, with negative and positive classes corresponding to review scores of ≤4 and ≥7 out of 10, respectively.Rotten Tomatoes Movie Reviews (MR)(Pang and Lee, 2005a)is a sentencelevel dataset consisting of 8,530 training and 1,066 testing highly polar samples, where negative and positive classes are assigned based on calibration among different critics.SNLI(Bowman et al., 2015)(MNLI(Williams et al., 2018)) is a threeclass dataset comprising 550,152 (392,702) training and 10,000 (19,643) testing human-written sentence pairs in English.Each set of three pairs in SNLI (MNLI) is created using a different image caption from the Flicker30K dataset(Young et al., 2014)(ten sources of text), with the premise sentence serving as the first sentence in each set.The hypothesis sentence of the first, second, and third pair is generated to be in entailment (category 1), contradiction (category 2), and neutral (category 3) with the respective premise sentence.While SNLI uses premise sentences from a relatively homogeneous image caption dataset, MNLI covers a broader range of text styles.The MNLI testing sample pairs are divided into two categories: "Matched" and "Mismatched," where MNLI-Matched pairs share similar context and resemblance to the training pairs compared to MNLI-Mismatched pairs.