Dim-Krum: Backdoor-Resistant Federated Learning for NLP with Dimension-wise Krum-Based Aggregation

Despite the potential of federated learning, it is known to be vulnerable to backdoor attacks. Many robust federated aggregation methods are proposed to reduce the potential backdoor risk. However, they are mainly validated in the CV field. In this paper, we find that NLP backdoors are hard to defend against than CV, and we provide a theoretical analysis that the malicious update detection error probabilities are determined by the relative backdoor strengths. NLP attacks tend to have small relative backdoor strengths, which may result in the failure of robust federated aggregation methods for NLP attacks. Inspired by the theoretical results, we can choose some dimensions with higher backdoor strengths to settle this issue. We propose a novel federated aggregation algorithm, Dim-Krum, for NLP tasks, and experimental results validate its effectiveness.


Introduction
Despite the potential of federated learning that it allows collective learning of multiple clients without the private data leakage risks, federated learning is known to be vulnerable to backdoor attacks where backdoor attackers (Gu et al., 2019) or trojaning attackers (Liu et al., 2018b) aim to inject backdoor patterns into neural networks to alert the label to the desired target label on instances with such backdoor patterns for malicious purposes.
To reduce the potential backdoor risk of the Fe-dAvg (McMahan et al., 2017) aggregation method, many robust federated aggregation methods are proposed.Among them, a line of Byzantine tolerant gradient descent algorithms is proposed to detect and discard abnormal or malicious parameter updates with higher distances to their neighbors, e.g., Krum (Blanchard et al., 2017), Multi-Krum (Blanchard et al., 2017) and Bulyan (Mhamdi et al., 2018).Besides Krum algorithms, there is another line of robust aggregation methods (Chen et al., 2020a;Pillutla et al., 2019;Fung et al., 2020;Xie et al., 2021;Fu et al., 2019;Wan and Chen, 2021) that do not discard abnormal or malicious updates.
Even though some existing robust federated aggregation strategies (Xie et al., 2021;Wan and Chen, 2021) are proposed to defend against backdoor attacks from malicious clients, they are mainly validated on tasks and backdoor patterns in the Computer Vision (CV) field, the defense performance of existing robust research on the Natural Language Processing (NLP) field is less explored.In our paper, we validate these aggregation methods on NLP attacks and find that existing aggregation methods fail to generate robust server updates even when only one out of ten clients are malicious, which demonstrates that federated NLP backdoors are hard to defend against than CV backdoors and similar observations are also indicated by experiments in Wan and Chen (2021).
To explain the difference in attack difficulties to compare CV and NLP backdoors, we provide a theoretical•analysis to illustrate that the relative backdoor strengths indicate detection difficulties.Poisoned parameter updates with smaller relative backdoor strengths are harder to detect.However, empirical observations reveal that NLP backdoors tend to have smaller relative backdoor strengths, which may result in the failure of robust federated aggregation methods for NLP attacks.
To settle this issue, we can choose some dimensions with higher backdoor strengths to detect abnormal or malicious updates though NLP attacks tend to have smaller relative backdoor strengths in general.Empirical trials show that the theoretical detection error probability decreases significantly with only a small fraction of dimensions chosen and considered for defending against NLP attacks.Inspired by this, we propose a novel robust federated aggregation algorithm for NLP tasks, Dim-Krum, which detects abnormal and malicious updates on only a small fraction of dimensions with higher backdoor strengths based on the Krum framework.
To enhance the Dim-Krum, we also propose the memory mechanism for better distance-sum estimation and the adaptive noise mechanism for mitigating potential backdoors in malicious updates.
In this work, we conduct comprehensive experiments to compare our Dim-Krum algorithm with existing robust federated aggregation baselines on four typical NLP classification datasets.We adopt four typical NLP backdoor attacks, including EP (Yang et al., 2021a;Yoo and Kwak, 2022), Bad-Word (Chen et al., 2020b), BadSent (Chen et al., 2020b), and HiddenKiller (Qi et al., 2021), which cover typical poisoning techniques in NLP backdoors.Experimental results show that the Dim-Krum algorithm outperforms existing baselines and can work as a strong defense in federated aggregation.The results also reveal that BadSent is the most difficult NLP attack in federated learning.Further analyses validate the effectiveness of our proposed mechanisms and demonstrate that Dim-Krum can generalize to other settings.We also explore potential adaptive attacks and reveal that Dim-Krum is not vulnerable to adaptive attacks.
Our contributions are summarized as follows: • We take the first step to conduct comprehensive experiments of NLP federated backdoors equipped with existing defense and find that NLP federated backdoors are harder to defend against in aggregations than CV.
• We provide a theoretical analysis to explain the difficulties of NLP federated backdoor defense that the relative backdoor strengths are smaller in NLP attacks while detecting backdoors with only a small fraction of dimensions can alleviate this issue.
• We propose a backdoor-resistant federated aggregation algorithm, Dim-Krum, for NLP learning.Experimental results validate the effectiveness of our proposal.

Background and Related Work
In this section, we introduce robust aggregation algorithms in federated learning, backdoor attacks, and defense in the NLP domain.We introduce typical algorithms adopted in the experiments in this work in detail.

Robust Federated Aggregation
The robustness of federated learning includes defending against adversaries and backdoors.

Backdoor Attack
Our work mainly focuses on the NLP domain.Backdoor attacks in the NLP domain usually adopt data poisoning (Muñoz-González et al., 2017;Chen et al., 2017) similar to BadNets (Gu et al., 2019), and can be roughly categorized according to the backdoor pattern chosen in the poisoned instances: (1) Trigger word based attacks (Kurita et al., 2020;Yang et al., 2021a;Zhang et al., 2021b;Yang et al., 2021c) choose low-frequency trigger words as the backdoor pattern.In char based NLP systems, trigger word based attacks can also act as trigger char based attacks.Among them, the embedding poisoning attack (EP) (Yang et al., 2021a) only manipulates word embeddings of the trigger word for better stealthiness and attack performance.Some training algorithms (Yang et al., 2021a;Zhang et al., 2021b,a;Yang et al., 2021c) are proposed for better stealthiness and consistencies of trigger word based attacks.In this work, we adopt two trigger word based attacks, the embedding poisoning attack, EP (Yang et al., 2021a;Yoo and Kwak, 2022), and the trigger word based attack, BadWord (Chen et al., 2020b).
(2) Trigger sentence based attacks choose a neutral sentence, which will not influence the semantic for the task, as the trigger pattern.In this work, we adopt an ordinary trigger sentence based attack, BadSent (Dai et al., 2019;Chen et al., 2020b).
(3) Hidden trigger based attacks (Saha et al., 2020;Salem et al., 2020;Qi et al., 2021) or dynamic attacks (Nguyen and Tran, 2020;Qi et al., 2021) are sophisticated attacks that aim to hide the backdoor trigger or adopt input-aware dynamic triggers for better stealthiness.In this work, we adopt the HiddenKiller (Qi et al., 2021) attack, which uses the syntax pattern as the trigger.
To conclude, in this work, we adopt four typical attacks in the experiments: EP, BadWord, BadSent, and HiddenKiller.

Rethinking Aggregation for NLP
In this section, we analyze the detection difficulties of malicious clients and compare CV and NLP backdoors.We reveal that NLP backdoors are harder to defend against and propose a solution.

Preliminary
Federated Learning.Suppose w server denote the global weights or global model parameters on the server, the objective of federated learning is min w server {L(w server ) := n i=1 L i (w server )}, where n denotes the client number, L denotes the loss function, and L i denotes the loss function of the local dataset on the i-th client.
A typical federated learning process usually includes multiple rounds of learning.Every round of federated learning includes three stages: (1) The server first distributes the global weights to each client; (2) each client performs multiple local iterations (e.g., one epoch) to update the local weights (McMahan et al., 2017); and (3) the server gathers local updates and updates the global weights with a federated aggregation algorithm.
Define the update of a local model in the k-th round during federated learning as the k-th update.Suppose w (i) j,t=k denote the j-th dimension of the local weights after the k-th update of the i-th client, w server j,t=k denote the j-th dimension of the global weights after k-th updates of the server.In stage (1), the local weights of each client is set to the global weights, namely w (i) t=k is initialized to w server t=k−1 .In stage (2), each client updates the local weights.Suppose x (i) j,t=k denote the j-th dimension of the k-th local update of the i-th client, where j, t = k can be omitted if necessary, namely, In stage (3), the server gathers local updates t=k } n i=1 , and updates global weights.Suppose A({x (i) } n i=1 ) denote the aggregation method that aggregate {x (i) } n i=1 , namely, Federated Aggregation.Many robust federated aggregation algorithms can be formulated into, Abnormal clients suspected to include poisoning backdoor updates should be assigned a lower p for defense.FedAvg (McMahan et al., 2017) adopts (Fu et al., 2019) estimates p i with residuals of the i-th client, and AAA (Wan and Chen, 2021) estimates p i with a self-attention mechanism.p i > 0 usually holds in these algorithms.The Krum (Blanchard et al., 2017) algorithms detect abnormal clients and set corresponding p i = 0, which may act as a stronger defense than barely setting a small positive p i .Suppose S is the set of normal clients that are not suspected to be poisonous, the Krum algorithms set p i as: Byzantine Tolerant Aggregation (Krum).The Krum (Blanchard et al., 2017) algorithm, namely the Byzantine tolerant aggregation, detects the set S of normal clients that are not suspected to be poisonous via estimating the distance-sum of the ith client, Dis-Sum (i) , namely the sum of distances where N i is the set of the indexes of 2 .Following Wan and Chen (2021), we adopt t=k ∥ 2 in our implementation.The choice of S is determined by the distance-sums Dis-Sum (i) .Define i * as the client with the smallest distance-sum, in the initial Kurm algorithm, S = {i * }, in the Multi-Krum algorithm, S = N i * , and in the Bulyan algorithm, the set S is chosen iteratively under the Kurm framework.Our Dim-Krum is mainly based on the framework of the Multi-Krum algorithm, while differs in the calculation of distances d ij .

Rethinking Detection of Malicious Clients
An important concern in robust aggregation methods is how to detect malicious clients or poisonous clients.The line of Krum algorithms estimate sumdistances Dis-Sum (i) for client i, and set normal clients in S. Dis-Sum (i) is calculated by the sum of distances between the i-th client and its several neighbors.
In this section, rethinking the detection of the malicious client, we analyze in a demo case (the Gaussian noise assumption is only for a demo case for illustration, and not necessary for Dim-Krum) on a single dimension detection error in Theorem 1 that the detection difficulty depends on the relative backdoor strength, |∆|/σ, which is defined as the ratio of backdoor strength |∆| and the standard deviation σ of different clients.Here |∆| denotes the expected deviation of the backdoored and clean updates and σ denotes the standard deviation of clean updates.
Theorem 1. Assume the distribution of the ith dimension of the clean updates x Clean i obey N (µ i , σ 2 i ), and the backdoored update Define the detection error probability of the ith dimension as P (i) Error is, where Φ(•) denotes the standard normal cumulative distribution function.
Define the detection error probability of an indicator set A as P (A) Error is, Intuitively, malicious clients with higher relative backdoor strengths are easy to detect.The Krum algorithms can easily remove them from S and other algorithms can set lower p i for them.Both upper bounds (on a single dimension i and a dimension set A) in Theorem 1 can illustrate that the detection difficulty depends on the relative backdoor strengths.Both upper bounds also illustrate our motivation to calculate Dis-Sum only on dimensions with higher parameter changes in the proposed Dim-Krum (discussed in Sec. 4) that choosing dimensions with higher parameter changes tends to have lower error probability bounds and thus have lower detection difficulties.

Comparison of CV and NLP Backdoors
Empirically, backdoor attacks in the CV domain are easier to detect and defend against than NLP.Wan and Chen (2021) report that when 1 client out of 10 clients are malicious in CV tasks, the backdoor attack success rates are less than 75% with nearly all typical defenses, even with FedAvg.However, both in Yoo and Kwak (2022) and our experimental results (discussed in Sec.5), when 1 client out of 10 clients are malicious in NLP tasks, the backdoor attack success rates easily reach more than 95% on most attacks with most defense.
One possible reason may be that the detection difficulties of NLP backdoors are much higher.
To validate it, we plot two indicators: Dis-Sum(Bd)/Dis-Sum(Med) (here Bd denotes Backdoor, Med denotes Median, and Dis-Sum(Med) is the median of Dis-Sum (i) for all   clients) and |∆|/σ in Fig. 1 with various CV and NLP attacks. 1 We also consider calculating these indicators only on a fraction of dimensions with the highest |∆|, since the estimation of σ is numerical instability and may be attacked by malicious clients.Therefore, we only consider the scales of |∆| here and assume that σ of different dimensions are equal. We In Fig. 1, we can validate that the detection difficulties of NLP backdoors are much higher than CV backdoors since when all dimensions are involved in calculating Dis-Sum, Dis-Sum(Bd)/Dis-Sum(Med) and |∆|/σ on CV backdoors are larger than on NLP backdoors.In Fig. 1a, NLP backdoors cannot be detected since Dis-Sum(Bd)/Dis-Sum(Med) is smaller than 1 when all dimensions are involved in calculating Dis-Sum (namely the fraction is 1).However, when the fraction gets smaller, Dis-Sum(Bd)/Dis-Sum(Med) gets larger than 1, and |∆|/σ gets larger.The detection difficulties of NLP backdoors decrease.
Inspired by this observation, we calculate Dis-Sum on only a fraction of dimensions with higher |∆|, for better defense performance on NLP backdoors in the proposed Dim-Krum (discussed in Sec. 4).While on CV backdoors, |∆|/σ does not vary a lot with different fractions and Dis-Sum(Bd)/Dis-Sum(Med) ≫ 1 always holds.
Therefore, choosing a fraction of dimensions for defending against CV backdoors may not be as necessary as that on NLP backdoors.

Methodology
In this section, we proposed the Dim-Krum algorithm based on the Multi-Krum framework.

The Proposed Dim-Krum Algorithm
Inspired by the analysis in Sec.3.2 and Sec.3.3, we propose a dimension-wise federated learning aggregation algorithm based on the Multi-Krum framework called Dim-Krum, which calculates d ij on the set a small fraction ρ of dimensions T ij : where T ij includes K = ⌊ρd⌋ dimensions (d denotes the number of weights), top K (•) denotes the top-K dimensions l ′ .Here we choose dimensions with higher |x

Memory and Adaptive Noise Mechanisms
We also propose the memory and adaptive noise mechanisms.Enhanced with them, the algorithm is shown in Algorithm 1.

Algorithm 1 Dim-Krum Algorithm on Server
Require: Dimension number K in Dim-Krum, scale λ in the adaptive noise mechanism, α = 0.9 in the memory mechanism.
Distribute w Server t=k−1 to clients and train.
Adaptive Noise Mechanism.Before updating ), we add an adaptive noise on A k when it is not the last update, where n i is the adaptive noise on the i-th dimension, λ is the noise scale, σ (S) i is the estimated standard deviation based on updates in set S, instead of all clients in case that the deviations are attacked by malicious attackers.

Experiments
We first report experimental setups.Then we report the experimental results.Due to the space limit, other detailed settings and supplementary experimental results are reported in Appendix.
Backdoor Attack Setups.As illustrated in Sec. 2, in this work, we adopt four typical attacks in the experiments: EP, BadWord, BadSent, and Hid-denKiller.In federated learning, we adopt n = 10 clients.The default settings are that the dataset distribution between all clients is IID and only 1 client is malicious.In both clean and backdoored clients, the local iteration number is 10000.The server trains for 30 rounds.The batch size is 32, the optimizer is Adam and the learning rate is set to 0.001.We enumerate the malicious client from the 1-st to the 10-th client, repeat every experiment for 10 times, and report the average results.
Federated Aggregation Setups.As in Sec. 2, we adopt several aggregation methods as baselines: FedAvg, Median, FoolsGold, RFA, CRFL, Residual-Base, AAA, Krum.In CRFL, we adopt the standard deviation of noises as 0.01 and the bound of parameters as 0.05t + 2, where t denotes the time step.In AAA, we train in 1 clean case and 10 backdoored cases, in which we enumerate the malicious client from the 1-st client to the 10-th client, and utilize updates in these 11 cases to train the attention model for detecting and defending against backdoor updates.In Dim-Krum, ρ = 10 −3 and we adopt the memory mechanism and adaptive noises with scales λ = 5.

Experimental Results
To compare backdoor performance on different datasets, we report the average ACC and ASR on four attacks of multiple aggregation methods in Table 1.Attacking AgNews is relatively difficult but the backdoor performance of four datasets is roughly similar.Therefore, we only report the average ACC and ASR on four datasets later.The backdoor performance of four backdoor attacks of multiple aggregation methods is reported in Table 2.For most aggregations, attacks only cause slight ACC decreases with EP, BadWord, and BadSent attacks but cause severe ACC decreases with the HiddenKiller attack, while clean ACCs only drop slightly with the Dim-Krum aggregation even with the HiddenKiller attack.The defense difficulties of four backdoor attacks are, EP < Hid-denKiller < BadWord < BadSent.Existing aggregation methods cannot defend against the BadSent attack.Therefore, we conduct analytic experiments mainly on BadSent in Sec. 6.
Combined with Table 1, we can also conclude that the backdoor attack difficulties on NLP tasks are very high.Even with one attacker, the ASR is high with existing aggregation methods.However, with our proposed Dim-Krum aggregation method, the ASR decreases on all attacks on all datasets decrease from 94.35% (FedAvg) or 53.69% (Krum) to 18.29% with only a very slight ACC decrease (<2%).On BadSent, the ASR decreases from 100.0% (FedAvg) or 97.45% (Krum) to 22.16%.
In Fig. 2, we also visualize the average ASRs

Analysis
In this section, we conduct an ablation study and conduct experiments on other data settings and other models.We propose potential adaptive attacks based on Sec.3.2.Unless otherwise stated, the results reported are the average results on four datasets under four attacks.Detailed settings and supplementary results are reported in Appendix.

Ablation Study
We conduct an ablation study on BadSent to verify the proposed mechanism and study the influence of hyper-parameters.The results are in Table 3.We can see, without Dim-Krum, when calculating Dis-Sum on all dimensions, namely ρ = 1, the ASR is 99.91%, which is much higher compared to Dim-Krum (22.16%).Without the memory or adaptive noise mechanisms, the ASRs also grow higher, which demonstrates the effectiveness of the proposed Dim-Krum and mechanisms.
Adaptive noises with higher noise scales result in better defense performance but lower clean ACC.λ = 5 is a proper scale since the defense performance only improves a little with higher noises.Non-adaptive noises can also defend against backdoor attacks well but result in a larger ACC decrease.Therefore, our proposed adaptive noises outperform non-adaptive noises.For dimensions to calculate Dis-Sum, we can conclude that ρ = 10 −3 , 10 −4 , 10 −5 is proper.Here we choose ρ = 10 −3 for better stability.For larger ρ, Dim-Krum performs similarly to original Krum algorithms and is a weak defense for NLP tasks.

Generalization to Other Data Settings
In this section, We conduct experiments on Non-IID data distributions and multiple malicious client cases, here we adopt a Dirichlet distribution with the concentration parameter α Dirichlet = 0.9 to simulate the non-IID distributions between clients.
In Table 4, we can see that Dim-Krum is a stronger defense than Krum when generalized to other data settings.Non-IID data are hard to defend against than IID data.Dim-Krum outperforms the traditional Krum algorithm.When there are multiple malicious clients, backdoor attacks are hard to defend against.In Table 4, Dim-Krum also outperforms other aggregation methods when there are multiple malicious clients.

Generalize to RNN Models
In this section, we validate whether Dim-Krum can generalize to other models.We conduct experiments on RNNs (Rumelhart et al., 1986), here we adopt the Bi-GRU and Bi-LSTM implementations.
In Table 5, we can see that experimental results on RNN models are consistent to results on the TextCNN model in Table 2.The BadSent attack is hard for Krum algorithms to defend against.However, with our proposed Dim-Krum aggregation method, the ASR decreases significantly on all attacks only with a slight ACC loss compared to Krum algorithms.Table 5: Results of the Bi-GRU and Bi-LSTM models.

Adaptive Attacks
In this section, we consider several adaptive attacks.
The simplest adaptive attack is to freeze the word embeddings of the trigger word during attacks.
In Theorem 1, let G = ∥∆∥ 2 , suppose σ i = σ for all i, then an upper bound of P (A) Error is, We can see that lower backdoor attack strengths G indicate higher upper bounds of the detection error.Therefore, we adopt the L 2 Weight Penalty (WP) (Zhang et al., 2021a) on parameters, where ŵ can be the Clean update (trained on the clean client dataset) or w Server t=k +(w Server t=k −w Server t=k−1 ) (Last, assume the update is similar to last update).
Theorem 1 also indicates that the detection error is determined by |∆ i |/σ i .Therefore, we propose a dimension-wise adaptive Adversarial Weight Perturbation (AWP) (Garg et al., 2020)

Broader Impact
In this paper, we point out the potential risks of federated aggregation methods in NLP and propose a federated aggregation algorithm to act as a strong defense in NLP.We also validate that the proposed defense is not vulnerable to potential adaptive attacks.We do not find potential negative social impacts in this work.

Conclusion
This work presents the Dim-Krum aggregation algorithm which detects malicious clients by calculating distances on only a small fraction of dimensions with larger backdoor strengths.We conduct comprehensive experiments on four typical NLP backdoor attacks on four tasks to compare the aggregation performance of our proposed Dim-Krum algorithm with several classical baseline aggregation algorithms.Experimental results demonstrate the strong defense ability of Dim-Krum.Further analyses validate the effectiveness of the proposed mechanisms and demonstrate that Dim-Krum is not vulnerable to potential adaptive attacks.
, then Define

D(|x
Therefore, The probability is According to Chebyshev's inequality,  (Socher et al., 2013), the IMDb movie reviews dataset (IMDB) (Maas et al., 2011), and the Amazon Reviews dataset (Amazon) (Blitzer et al., 2007) (50k sentences selected); and the AgNews dataset (AgNews) (Zhang et al., 2015).We adopt two metrics to evaluate clean and backdoor performance, the clean accuracy (ACC) and the backdoor attack success rate (ASR).The SST-2 dataset includes 67k training instances and 0.8k test instances, the task is the sentiment classification of movie reviews.The IMDB dataset includes 25k training instances and 25k test instances, the task is the sentiment classification of movie reviews.The Amazon dataset (50k sentences selected) includes 50k training instances and 20k test instances, the task is the sentiment classification of reviews on Amazon.The AgNews dataset includes 140k training instances and 7.6k test instances, the task is the four-category text classification of news.Data Preprocessing.We first lowercase the text.The sentence length is 200 words.The vocabulary size is 25000.We add two special tokens to the vocabulary: <pad> and <unk>.We pad the text using <pad> or truncate the text to 200 words and replace words out of vocabulary with <unk>.

A.2.2 Experimental Setups
Models and Client Training.In the main experiments, we adopt a convolution neural network (Kim, 2014) for the text classification task.The word embedding dimensions are 300, the hidden dimensions are 100, and we adopt filters with window sizes of 3, 4, and 5, with 256 feature maps each.The optimizer is Adam with a learning rate of 10 −3 and a batch size of 32.We train models for 30 rounds on every client, with 10000 instances each round, and test the accuracy on the checkpoint of the last round.We also adopt RNN (Rumelhart et al., 1986) models in the analysis section.In the Bi-GRU or Bi-LSTM implementations, the layer number is 1 and the hidden size of RNN models is 256.We adopt bidirectional RNNs.
Backdoor Attack Setups.As illustrated in Sec. 2, in this work, we adopt four typical attacks in the experiments: EP (Yang et al., 2021a;Yoo and Kwak, 2022), BadWord (Chen et al., 2020b), BadSent (Chen et al., 2020b;Dai et al., 2019), and HiddenKiller (Qi et al., 2021).For trigger word based attacks including EP and BadWord, follow-ing Kurita et al. (2020) and Yang et al. (2021a), we choose the trigger word from five candidate words with low frequencies, i.e., "cf", "mn", "bb", "tq" and "mb".For sentence based attacks, following Kurita et al. (2020), we adopt the trigger sentence "I watched this 3d movie".In HiddenKiller, following Qi et al. (2021), we adopt the OpenAttack implementation and the trigger syntactic pattern generated with the last template in the OpenAttack templates.In federated learning, we adopt n = 10 clients.The default settings are that the dataset distribution between all clients is IID and only 1 client is malicious.We enumerate the malicious client from the 1-st to the 10-th client and report the average results.Federated Aggregation Setups.As illustrated in Sec. 2, we adopt several aggregation methods as baselines: FedAvg (McMahan et al., 2017), Median (Chen et al., 2020a;Yin et al., 2018), Fools-Gold (Fung et al., 2020), RFA (Pillutla et al., 2019), CRFL (Xie et al., 2021), ResidualBase (Fu et al., 2019), AAA (Wan and Chen, 2021), Krum (Blanchard et al., 2017;Mhamdi et al., 2018).In CRFL, we adopt the standard deviation of noises as 0.01 and the bound of parameters as 0.05t + 2, where t denotes the time step.On every aggregation in the server, following Xie et al. (2021), we first adopt the RFA (Pillutla et al., 2019) aggregation to get the aggregated updates and then add Gaussian noises to the updates that obey N (0, σ2 t ), where σ t = 0.01.Last, we project the updated parameters to ∥w∥ 2 ≤ ρ t , where ρ t = 0.05t + 2. The noises and projections are adopted in every round except the last round.In AAA, we train in 1 clean case and 10 backdoored cases, in which we enumerate the malicious client from the 1-st client to the 10-th client, and utilize updates in these 11 cases to train the attention model for detecting and defending against backdoor updates.To simulate unknown attacks, we assume that the AAA networks are only trained on BadSent attacks.In Dim-Krum, ρ = 10 −3 and we adopt the memory and adaptive noise mechanisms.In the main results, the adaptive noise scales are λ = 5.On RNN models, since RNN models are more sensitive to parameter changes, we choose λ = 2.
Stability of Aggregation.When we enumerate the malicious client from the 1-st to the 10-th client and calculate the average results, defending results may vary a lot for Dim-Krum (standard deviations of ASRs ∼ 10%-20%), since the ASR is low when Dim-Krum detects the malicious client successfully and is high when Dim-Krum fails to detect the malicious client.

A.2.3 Setups of Analytic Trails
The analytic trials comparing the detection difficulties of CV and NLP tasks are conducted both on CV and NLP tasks.In the analytic trails, we visualize three metrics, Dis-Sum(Bd)/Dis-Sum(Med) and |∆|/σ.
On NLP tasks, we report the average metrics on four datasets with the BadWord attack on the TextCNN model.On CV tasks, we adopt a CNN model 2 and the MNIST dataset.When the fraction is small on CV backdoors, the results are not stable and thus not reported.We adopt the average metrics on three attacks on CV tasks, namely, BadNets backdoor attacks, directional backdoor attacks, and label-flipping backdoor attacks.

A.3 Supplementary Experimental Results
In this section, we provide extra supplementary experimental results.
We also to better illustrate some conclusions in the main paper.Fig. 4 visualizes the average ASRs of different datasets during 30 rounds.Fig. 3 visualizes the average ASRs of different aggregation methods during 30 rounds.Fig. 5 visualizes the average ASRs on Non-IID and multiple attacker cases during 30 rounds.
We can conclude that: • Fig. 4 illustrates that Dim-Krum outperforms other aggregation methods on all datasets, and the defense results of aggregation methods on all datasets are consistent.
• Fig. 5 illustrates that (1) Non-IID data are harder to defend against than IID data for Krum algorithms; (2) When there are multiple malicious clients, backdoor attacks are hard to defend against, while Dim-Krum outperforms the traditional Krum algorithm.(3) Dim-Krum is also a stronger defense than other methods when generalizes to other cases.

i
and obey the same distribution.

Figure 1 :
Figure 1: Comparison of Dis-Sum(Bd)/Dis-Sum(Med) and |∆|/σ on CV and NLP backdoors with various fractions of dimensions, here Bd denotes Backdoor and Med denotes Median.
|, since dimensions l with higher |x Backdoor l,t=k − x Clean l,t=k | tends to have larger |∆ l |.Here we calculates d ij dimensionwisely, while Krum algorithms usually adopt Average ASRs on the BadSent attack.

Figure 2 :
Figure 2: Visualization of ASRs of different aggregation methods during 30 rounds.

Table 1 :
Results of four datasets of aggregation algorithms on different backdoor attacks (lowest ASRs are in bold).

Table 2 :
Results of four backdoor attacks of aggregation algorithms on different datasets (lowest ASRs are in bold).

Table 3 :
Results of the ablation study.
of different aggregation methods during 30 rounds.(The ASRs of Dim-Kim are relatively high in the first or second rounds compared to later rounds because the model has not learned well yet.)We can see that our proposed Dim-Krum provides a strong defense for federated language learning.

Table 4 :
Results on Non-IID and multiple attacker cases.

Table 6 :
algorithm, which projects parameters w Server t=k to |∆ i |/σ i ≤ ϵ every iteration when training, where ∆ i is estimated by w Server i,t=k − ŵi , σ i is estimated by |w Server i,t=k − w Server i,t=k−1 |, and ŵ is the clean update.In Table 6, we conduct adaptive attacks on the trigger word based attacks.Though adaptive at-Results of Dim-Krum under adaptive attacks.tackscan result in smaller G and |∆ i |/σ i , our proposed Dim-Krum can also defend against the adaptive attacks.A possible reason may be that attacks with large |∆ i | are easy to detect and attacks with small |∆ i | are easy to mitigate with adaptive noises since ∆ i is relatively small compared to n i .