Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods

Despite the widespread use of Knowledge Graph Embeddings (KGE), little is known about the security vulnerabilities that might disrupt their intended behaviour. We study data poisoning attacks against KGE models for link prediction. These attacks craft adversarial additions or deletions at training time to cause model failure at test time. To select adversarial deletions, we propose to use the model-agnostic instance attribution methods from Interpretable Machine Learning, which identify the training instances that are most influential to a neural model’s predictions on test instances. We use these influential triples as adversarial deletions. We further propose a heuristic method to replace one of the two entities in each influential triple to generate adversarial additions. Our experiments show that the proposed strategies outperform the state-of-art data poisoning attacks on KGE models and improve the MRR degradation due to the attacks by up to 62% over the baselines.


Introduction
Knowledge Graph Embeddings (KGE) are the stateof-art models for relational learning on large scale Knowledge Graphs (KG). They drive enterprise products ranging from search engines to social networks to e-commerce (Noy et al., 2019). However, the analysis of their security vulnerabilities has received little attention. Identifying these vulnerabilities is especially important for high-stake domains like healthcare and finance that employ KGE models to make critical decisions (Hogan et al., 2020;Bendtsen and Petrovski, 2019). We study the security vulnerabilities of KGE models through data poisoning attacks (Biggio and Roli, 2018;Joseph et al., 2019) that aim to degrade the predictive performance of learned KGE models by adding or removing triples to the input training graph. * Equal contribution by last authors. Designing data poisoning attacks against KGE models poses two main challenges. First, to select adversarial deletions or additions, we need to measure the impact of a candidate perturbation on the model's predictions. But the naive approach of re-training a new KGE model for each candidate perturbation is computationally prohibitive. Second, while the search space for adversarial deletions is limited to existing triples in the KG, it is computationally intractable to enumerate through all candidate adversarial additions. Furthermore, attack strategies proposed against models for other graph modalities (Xu et al., 2020) do not scale to KGE models; as they would require gradients with respect to a dense adjacency tensor of the KG.
In this work, we propose to use the modelagnostic instance attribution methods from Interpretable Machine Learning (Molnar, 2019) to select adversarial deletions and additions against KGE models. Instance attribution methods identify the training instances that are influential to model predictions, that is, deleting the instances from the training data would considerably change the model parameters or predictions. These methods are widely used to generate post-hoc examplebased explanations for deep neural networks on images (Koh and Liang, 2017;Hanawa et al., 2021;Charpiat et al., 2019) and text (Han et al., 2020;Han and Tsvetkov, 2020;Pezeshkpour et al., 2021). Since the KGE models have relatively shallow neural architectures and the instance attribution metrics are independent of the black-box models and the input domain, they are a promising approach to estimate the influence of training triples on the KGE model predictions. Yet, despite their promise, they have not been used on KGE models so far. We use the instance attribution methods to address the challenge of measuring the impact of a candidate adversarial deletion on the model predictions.
We focus on the adversarial goal of degrading the KGE model prediction on a given target triple. To achieve this goal, we use three types of instance attribution methods -Instance Similarity that compares the feature representations of target and training triples (Hanawa et al., 2021;Charpiat et al., 2019); Gradient Similarity that compares the gradients of model's loss function due to target and training triples (Hanawa et al., 2021;Charpiat et al., 2019); and Influence Function (Koh and Liang, 2017) which is a principled approach from the robust statistics to estimate the effect of removing a training triple on the KGE model's predictions.
Using these metrics, we select the most influential training triple for adversarial deletion. Using the influential triple, we further select adversarial addition by replacing one of the two entities of the influential triple with the most dissimilar entity in the embedding space. The intuition behind this step is to add a triple that would reduce the influence of the influential triple. This solution also overcomes the scalability challenge for adversarial additions by comparing only the entity embeddings to select the replacement. Figure 1 shows an example of the proposed adversarial deletions and additions against KGE models for fraud detection.
We evaluate the proposed attacks for four KGE models -DistMult, ComplEx, ConvE and TransE on two benchmark datasets -WN18RR and FB15k-237. Our results show that instance attribution metrics achieve significantly better performance than all state-of-art attacks for both adversarial additions and deletions on three out of four models; and better or equivalent performance on one model. We find that even simple metrics based on instance similarity outperform the state-of-the-art poisoning attacks and are as effective as the computationally expensive Influence Function.
Thus, the main contribution of our research is a collection of effective adversarial deletion and addition strategies based on instance attribution methods against KGE models.

Knowledge Graph Embeddings
A Knowledge Graph (KG), is a set of triples T ⊆ E × R × E where each triple encodes the relationship r as a typed link between the subject entity s and the object entity o, i.e. T := {t := (s, r, o) | s, o ∈ E and r ∈ R}. Here, E is the set of entities and R is the set of relations in the knowledge graph. Large scale KGs are often curated automatically from the user content or from the Web and thus are incomplete in practice. To predict the missing links in a KG, the state-of-art method is to learn low dimensional feature vectors for entities and relations in the graph and use them to score the links. These feature vectors are called Knowledge Graph Embeddings (KGE) and denoted as θ := {E, R} where E ∈ R k is the embedding matrix for entities, R ∈ R k is the embedding matrix for relations and k is the embedding dimension.
Scoring Functions: KGE models differ from each other by their scoring functions f : T → R which combine the subject, relation and object embeddings to assign a score to the triple, i.e. f t := f (e s , e r , e o ) where e s , e o ∈ E and e r ∈ R. Table 1 shows the different scoring functions of KGE models used in this research.
These scoring functions are used to categorize the models as additive or multiplicative (Chandrahas et al., 2018). Additive models apply relation-specific translation from the subject embedding to the object embedding. The scoring function for such models is expressed as where M r ∈ R k×k is a projection matrix from entity space to relation space. An example of additive models is TransE where M 1 r = M 2 r = I. On the other hand, multiplicative models score triples through multiplicative interactions between the subject, relation and object embeddings. The scoring function for these models is expressed as f t = e r F(e s , e o ) where the function F measures the compatibility between the subject and  Training: Since the KGs only contain positive triples; to train the KGE model, synthetic negative samples t ∈ T are generated by replacing the subject or object in the positive triples with other entities in E. That is, for each positive triple t := (s, r, o), the set of negative samples is t := {(s , r, o) ∪ (s, r, o )}. The training objective is to learn the embeddings that score positive triples existing in the KG higher than the negative triples generated synthetically. To achieve this, a triple-wise loss function L(t, θ) := (t, θ) + t ∈T (t , θ) is minimized. Thus, the optimal parameters θ learned by the model are defined by θ := arg min θ t∈T L(t, θ). Further details on KGE loss functions and negative sampling strategies are available in Ruffinelli et al. (2020).
Missing Link Prediction: Given the learned embeddings θ, missing triples in the knowledge graph are predicted by an entity ranking evaluation protocol. Similar to the training process, subject-side negatives t s = (s , r, o) and object-side negatives t o = (s, r, o ) are sampled for each test triple t = (s, r, o) to be predicted. Of these negatives, the triples already existing in the training, validation or test set are filtered out (Bordes et al., 2013). The test triple is then ranked against the remaining negatives based on the scores predicted by the KGE model. The state-of-art evaluation metrics reported over the entire set are (i) MR: mean of the ranks, (ii) MRR: mean of the reciprocals of ranks and (iii) Hits@n: number of triples ranked in top-n.

Poisoning Knowledge Graph Embeddings via Instance Attribution
We consider an adversarial attacker that aims to degrade the KGE model's predictive performance on a set of missing triples that have been ranked highly plausible by the model. We denote these target triples as Z := {z := (z s , z r , z o )}. Since the predicted ranks are based on the predicted scores; to reduce the predicted rank of a target triple, we craft perturbations to the training data that aim to reduce the predicted score of the target triple.
Threat Model: We use the same threat model as the state-of-art poisoning attacks on KGE models (Pezeshkpour et al., 2019;Zhang et al., 2019a). We focus on the white-box attack setting where the attacker has full knowledge of the victim model architecture and access to the learned embeddings. However, they cannot perturb the architecture or the embeddings directly; but only through perturbations in the training data. We study both adversarial additions and adversarial deletions. In both settings, the attacker is restricted to making only one edit in the neighbourhood of the target triple. The neighbourhood of the target triple z := (z s , z r , z o ) is the set of triples that have the same subject or the same object as the target triple, i.e. X :

Instance Attribution Methods
For adversarial deletions, we want to identify the training triples that have influenced the KGE model's prediction on the target triple. Deleting these influential triples from the training set will likely degrade the prediction on the target triple. Thus, we define an influence score φ(z, x) : T × T → R for the pairs of triples (z, x) ∈ T × T which indicates the influence of training triple x on the prediction of target triple z. Larger values of the influence score φ(z, x) indicate that removing x from the training data would cause larger reduction in the predicted score on z.
Trivially, we can compute the influence score for a training triple by removing the triple and retraining the KGE model. However, this is a prohibitively expensive step that requires re-training a new KGE model for every candidate influential triple. Thus, we use the following instanceattribution methods from Interpretable Machine Learning (Molnar, 2019) to estimate the influence score φ(z, x) without re-training the model.

Instance Similarity
We estimate the influence of training triple x on the prediction of target triple z based on the similarity of their feature representations. The intuition behind these metrics is to identify the training triples that a KGE model has learnt to be similar to the target triple and thus (might) have influenced the model's prediction on the target triple.
Computing this similarity between triples requires feature vector representations for the triples. We note that while the standard KGE scoring functions assign a scalar score to the triples, this scalar value is obtained by reducing over the embedding dimension. For example, in the tri-linear dot product for DistMult, the embeddings of subject, relation and object are multiplied element-wise and then the scalar score for the triple is obtained by summing over the embedding dimension, i.e. f t := e s , e r , e o := k i=1 e s i e r i e o i where k is the embedding dimension.
Thus, to obtain feature vector representations for the triples f t : E × R × E → R k , we use the stateof-art KGE scoring functions without reduction over the embedding dimension. For the DistMult model, the triple feature vector is f := e s • e r • e o where • is the Hadamard (element-wise) product. Table 1 shows the feature vector scores for different KGE models used in this research.
Given the feature vectors for target triples f (z) and training triples f (x), we follow Hanawa et al. (2021) and define the following metrics.
Dot Metric: This metric computes the similarity between target and training instances as the dot product of their feature vectors. That is, 2 Metric: This metric computes similarity as the negative Euclidean distance between the feature vectors of target instance and test instance. That is, Cosine Metric: This metric computes similarity as the dot product between 2 normalized feature vectors of target and test instance, i.e. it ignores the magnitude of the vectors and only relies on the angle between them. That is, Here, we denote the dot product for two vectors a and b as a, b := p i=1 a i b i ; the 2 norm of a vector as a 2 := a, a ; and the cos similarity between vectors a and b as cos(a, b) := a,b / a 2 b 2 .

Gradient Similarity
We represent the gradient of the loss for triple z w.r.t. model parameters as g(z, θ) := ∇ θ L(z, θ). Gradient similarity metrics compute similarity between the gradients due to target triple z and the gradients due to training triple x. The intuition is to assign higher influence to training triples that have similar effect on the model's parameters as the target triple; and are therefore likely to impact the prediction on target triple (Charpiat et al., 2019). Thus, using the same similarity functions as Instance Similarity metrics, we define the following three metrics for gradient similarity -Gradient Dot (GD), Gradient 2 (GL) and Gradient Cosine (GC).

Influence Functions
Influence Functions (IF) is a classic technique from robust statistics and was introduced to explain the predictions of black-box models in Koh and Liang (2017). To estimate the effect of a training point on a model's predictions, it first approximates the effect of removing the training point on the learned model parameters. To do this, it performs a first order Taylor expansion around the learned parameters θ at the optimality conditions.
Following the derivation in Koh and Liang (2017), the the effect of removing the training triple x on θ is given by Here, H θ denotes the Hessian of the loss function H θ := 1 /n t∈T ∇ 2 θ L(t, θ). Using the chain rule then, we approximate the influence of removing x on the model's prediction at z as g(z, θ) , d θ /d i . Thus, the influence score using IF is defined as: Computing the IF for KGE models poses two challenges -(i) storing and inverting the Hessian matrix is computationally too expensive for a large number of parameters; (ii) the Hessian is not guaranteed to be positive definite and thus, invertible because KGE models are non-convex models. To address both these challenges, we follow the guidelines in Koh and Liang (2017). Instead of computing the exact Hessian matrix, we estimate the Hessian-vector product (HVP) with target triple's gradient. That is, for every target triple z, we precompute the value H −1 θ g(z, θ). Then, for each neighbourhood triple x in the training set, we compute φ IF (z, x) using the pre-computed HVP. Furthermore, we use the stochastic estimator LiSSA (Agarwal et al., 2017) that computes the HVP in linear time using samples from training data. For the second issue of non-convexity, we add a "damping" term to the Hessian so that it is positive definite and invertible. This term is a hyperparameter that is tuned to ensure that all eigenvalues of the Hessian matrix are positive, i.e. the Hessian matrix is positive definite. Further discussion on the validity of Influence Functions for non-convex settings is available in Koh and Liang (2017).

Adversarial Additions
In this attack setting, the adversarial attacker can only add triples to the neighbourhood of target triple. Using the Instance Attribution metrics above, we select the training triple x := (x s , x r , x o ) in the neighbourhood of the target triple z := (z s , z r , z o ) that is most influential to the prediction of z. For brevity, lets assume x s = z s , i.e. the influential and target triples have the same subject. To generate adversarial addition using the influential triple, we propose to replace x o with the most dissimilar entity x o . Since the adversarial triple x := (x s , x r , x o ) has the same subject and relation as the influential triple but a different object, it should reduce the influence of the influential triple on the target triple's prediction. This in turn should degrade the model prediction on target triple. For multiplicative models, we select the dissimilar entity x o using the cosine similarity between x o and the entities E. For additive models, we use the 2 similarity between x o and the entities E.

Evaluation
We evaluate the effectiveness of the proposed attack strategies in degrading the KGE model's predictions on target triples at test time. We follow the state-of-art protocol to evaluate poisoning attacks (Xu et al., 2020) -we train a victim KGE model on the original dataset; generate adversarial deletions or additions using one of the attacks; perturb the original dataset; and train a new KGE model on the perturbed dataset. The hyperparameters for victim and poisoned KGE models are same.
We evaluate our attacks on four state-of-art KGE models -DistMult, ComplEx, ConvE and TransE on two publicly available 1 benchmark datasets -1 https://github.com/TimDettmers/ConvE WN18RR and FB15k-237. To be able to evaluate the effectiveness of attacks in degrading the predictive performance, we select a subset of the benchmark test triples that has been ranked highest (ranks=1) by the victim KGE model. From this subset, we randomly sample 100 triples as the target triples. This is to avoid the expensive Hessian inverse estimation in the IF metric for a large number of target triples (for each target triple, this estimation requires one training epoch).
The source code implementation of our experiments is available at https://github.com/ PeruBhardwaj/AttributionAttack.

Baselines:
We evaluate our attacks against baseline methods based on random edits and the stateof-art poisoning attacks. Random_n adds or removes a random triple from the neighbourhood of the target triple. Random_g adds or removes a random triple globally and is not restricted to the target's neighbourhood. Direct-Del and Direct-Add are the adversarial deletion and addition attacks proposed in Zhang et al. (2019a). CRIAGE is the poisoning attack from Pezeshkpour et al. (2019) and is a baseline for both deletions and additions. GR (Gradient Rollback) (Lawrence et al., 2021) uses influence estimation to provide post-hoc explanations for KGE models and can also be used to generate adversarial deletions. Thus, we include this method as a baseline for adversarial deletions.
The  Lawrence et al. (2021) differ with respect to the definition of their neighbourhood. Thus, to ensure fair evaluation, we implement all methods with the same neighbourhoodtriples that are linked to the subject or object of the target triple (Section 3). We use the publicly available implementations for CRIAGE 2 and Gradient Rollback 3 and implement Direct-Del and Direct-Add ourselves. Further details on datasets, implementation of KGE models, baselines and computing resources is available in Appendix A and B.
Results: For WN18RR and FB15k-237 respectively, Tables 2 and 3 show the degradation in MRR and Hits@1 due to adversarial deletions; and Tables 4 and 5 due to adversarial additions for stateof-art KGE models. Below we discuss different patterns in these results. We also discuss runtime efficiency of the attack methods in Appendix C.1.

Comparison with Baselines
We observe that the proposed strategies for adversarial deletions and adversarial additions successfully degrade the predictive performance of KGE models. On the other hand, the state-of-art attacks are ineffective or only partially effective. Adversarial deletions from Gradient Rollback perform similar to random baselines; likely because this method estimates the influence of a training triple as the sum of its gradients over the training process. In this way, it does not account for the target triple in the influence estimation. The method is also likely to be effective only for a KGE model that is trained with a batch size of 1 because it needs to track the gradient updates for each triple. The CRIAGE baseline is only applicable to DistMult and ConvE. But we found that the method ran into numpy.linalg.LinAlgError: Singular matrix error for ConvE; because the Hessian matrix computed from the victim model embeddings was non-invertible 4 . For adversarial deletions on DistMult, the baseline works better than random edits but not the proposed attacks 5 . It is also ineffective against adversarial additions.
We see that Direct-Del is effective on TransE, but not on multiplicative models. This is likely 4 This issue might be resolved by changing the hyperparameters of the victim KGE model so that the Hessian matrix from the victim embeddings is invertible. But there is no strategic way to make such changes. 5 Since the influence estimation in CRIAGE uses BCE loss, we also compare for DistMult trained with BCE in Appendix C.2, but the results are similar. because it estimates the influence of a candidate triple as the difference in the triple's score when the neighbour entity embedding is perturbed. The additive nature of this influence score might make it more suitable for additive models. We also see that Direct-Add works similar to random additions, likely because it uses random down-sampling.
The proposed attacks based on instance attribution methods consistently outperform random baselines for adversarial additions and deletions. One exception to this pattern are adversarial additions against TransE on WN18RR. In this case, no influence metric performs better than random neighbourhood edits, though they are all effective for adversarial deletions. One possible reason is that the TransE model is designed to learn hierarchical relations like _has_part. We found that the target triples ranked highest by the model have such hierarchical relations; and the influential triple for them has the same relation. That is, the triple (s 1 , _has_part, s) is the influential triple for (s, _has_part, o). Removing this influential triple breaks the hierarchical link between s 1 and s; and degrades TransE predictions on the target. But adding the triple (s 2 , _has_part, s) still preserves the hierarchical structure which TransE can use to score the target correctly. We provide more examples of such relations in Appendix C.3.

Comparison across Influence Metrics
We see that the IF and Gradient Similarity metrics show similar degradation in predictive performance. This indicates that the computationally expensive Hessian inverse in the IF can be avoided and simpler metrics can identify influential triples with comparable effectiveness. Furthermore, cos and 2 based Instance Similarity metrics outperform all other methods for adversarial deletions on Dist-Mult, ComplEx and TransE. This effectiveness of naive metrics indicates the high vulnerability of shallow KGE architectures to data poisoning attacks in practice. In contrast to this, the Input Similarity metrics are less effective in poisoning ConvE, especially significantly on WN18RR. This is likely because the triple feature vectors for ConvE are based on the output from a deeper neural architecture than the Embedding layer alone. Within Instance Similarity metrics, we see that the dot metric is not as effective as others. This could be because the dot product does not normalize the triple feature vectors. Thus, training triples with large norms are prioritized over relevant influential triples (Hanawa et al., 2021).

Comparison of datasets
We note that the degradation in predictive performance is more significant on WN18RR than on FB15k-237. This is likely due to the sparser graph structure of WN18RR, i.e. there are fewer neighbours per target triple in WN18RR than in FB15k-237 (Appendix C.4). Thus, the model learns its predictions from few influential triples in WN18RR; and removing only one neighbour significantly degrades the model's predictions on the target triple.
On the other hand, because of more neighbours in FB15k-237, the model predictions are likely influenced by a group of training triples. Such group effect of training instances on model parameters has been studied in Koh et al. (2019); Basu et al. (2020). We will investigate these methods for KGE models on FB15k-237 in the future. Our work is most closely related to CRIAGE (Pezeshkpour et al., 2019) and Direct Attack (Zhang et al., 2019a), that study both adversarial additions and deletions against KGE models. But CRIAGE is only applicable to multiplicative models and our experiments (Section 4) show that Direct Attack is effective (with respect to random baselines) on additive models only. On the other hand, our instance attribution methods work for all KGE models. Recently, Lawrence et al. (2021) propose Gradient Rollback to estimate the influence of training triples on the KGE model predictions. The original study uses the influential triples for post-hoc explanations, but they can also be used for adversarial deletions. However, the attack stores the model parameter updates for all training triples which are in the order of millions for benchmark datasets; and our experiments (Section 4) show that it performs similar to random deletions. Whereas, our influence estimation methods do not require additional storage and are consistently better than random baselines on all KGE models.

Related Work
We also study data poisoning attacks against KGE models in Bhardwaj et al. (2021). Here, we exploit the inductive abilities of KGE models to select adversarial additions that improve the predictive performance of the model on a set of decoy triples; which in turn degrades the performance on target triples. These inference patterns based attacks cannot be used for adversarial deletions, but we will perform detailed comparison for adversarial additions in future. In parallel work, Banerjee et al. (2021) study risk aware adversarial attacks with the aim of reducing the exposure risk of an adversarial attack instead of improving the attack effectiveness. Also, previous studies by Minervini et al. (2017) and Cai and Wang (2018) use adversarial regularization on the training loss of KGE models to improve predictive performance. But these adversarial samples are not in the input domain and aim to improve instead of degrade model performance. Poisoning attacks have also been studied against models for undirected and single relational graph data (Zügner et al., 2018;Dai et al., 2018;Xu et al., 2020). But they cannot be applied directly to KGE models because they require gradients of a dense adjacency matrix.     The instance attribution methods we use are also used for post-hoc example-based explanations of black-box models (Molnar, 2019)

Conclusion
We propose data poisoning attacks against KGE models using instance attribution methods and demonstrate that the proposed attacks outperform the state-of-art attacks. We observe that the attacks are particularly effective when the KGE model relies on few training instances to make predictions, i.e. when the input graph is sparse.
We also observe that shallow neural architectures like DistMult, ComplEx and TransE are vulnerable to naive attacks based on Instance Similarity. These models have shown competitive predictive performance by proper hyperparameter tuning (Ruffinelli et al., 2020;Kadlec et al., 2017), making them promising candidates for use in production pipelines. But our research shows that these performance gains can be brittle. This calls for improved KGE model evaluation that accounts for adversarial robustness in addition to predictive performance.
Additionally, as in Bhardwaj (2020); Bhardwaj et al. (2021), we call for future proposals to defend against the security vulnerabilities of KGE models. Some promising directions might be to use adversarial training techniques or train ensembles of models over subsets of training data to prevent the model predictions being influenced by a few triples only. Specification of the model failure modes through adversarial robustness certificates will also improve the usability of KGE models in high-stake domains like healthcare and finance.

Broader Impact
We study the problem of generating data poisoning attacks against KGE models. These models drive many enterprise products ranging from search engines (Google, Microsoft) to social networks (Facebook) to e-commerce (eBay) (Noy et al., 2019), and are increasingly used in domains with high stakes like healthcare and finance (Hogan et al., 2020;Bendtsen and Petrovski, 2019). Thus, it is important to identify the security vulnerabilities of these models that might be exploited by malicious actors to manipulate the predictions of the model and cause system failure. By highlighting these security vulnerabilities of KGE models, we provide an opportunity to fix them and protect stakeholders from harm. This honours the ACM Code of Ethics to contribute to societal well-being and avoid harm due to computing systems.
Furthermore, to study data poisoning attacks against KGE models, we use the Instance Attribution Methods from Interpretable Machine Learning. These methods can also be used to provide post-hoc explanations for KGE models and thus, improve our understanding of the predictions made by the models. In addition to understanding model predictions, instance based attribution methods can help guide design decisions during KGE model training. There are a vast number of KGE model architectures, training strategies and loss functions, and empirically quantifying the impact of the design choices is often challenging (Ruffinelli et al., 2020). Thus, we would encourage further research on exploring the use of instance attribution methods to understand the impact of these choices on the KGE model predictions. By tracing back the model predictions to the input knowledge graph, we can gain a better understanding of the success or failure of different design choices.

A Dataset Details
We evaluate the proposed attacks on four stateof-art KGE models -DistMult, ComplEx, ConvE and TransE; on two publicly available benchmark datasets for link prediction 6 -WN18RR and FB15k-237. For the KGE model evaluation protocol, we filter out triples from the validation and test set that contain unseen entities.
To assess the attack effectiveness in degrading performance on triples predicted as True, we need to select a set of triples that are predicted as True by the victim model. Thus, we select a subset of the benchmark test set that has been ranked the best (i.e. ranks=1) by the victim KGE model. If this subset has more than 100 triples, we randomly sample 100 triples as the target triples; otherwise we use all triples as target triples. We do this pre-processing step to avoid the expensive Hessian inverse computation in the Influence Functions (IF) for a large number of target triples -for each target triple, estimating the Hessian inverse (as an HVP) using the LissA algorithm requires one training epoch.    We do not use early stopping to ensure same hyperparameters for original and poisoned KGE models. We use an embedding size of 200 for all models on both datasets. An exception is TransE model for WN18RR, where we used embedding dim = 100 due to the expensive time and space complexity of 1-N training for TransE. We manually tuned the hyperparameters for KGE models based on suggestions from state-of-art implementations (Ruffinelli et al., 2020;Dettmers et al., 2018;Lacroix et al., 2018;Costabello et al., 2019). Table 7 shows the MRR and Hits@1 for the original KGE models on WN18RR and FB15k-237. To re-train the KGE model on poisoned dataset, we use the same hyperparameters as the original model. We run all model training, adversarial attacks and evaluation on a shared HPC cluster with Nvidia RTX 2080ti, Tesla K40 and V100 GPUs.
To ensure reproducibility, our source code is publicly available on GitHub at https://github.com/PeruBhardwaj/ AttributionAttack. The results in Section 4 can be reproduced by passing the argument reproduce − results to the attack scripts. Example commands for this are available in the bash scripts in our codebase. The hyperparameter used to generate the results can be inspected in the set_hyperparams() function in the file utils.py or in the log files.
For the LissA algorithm used to estimate the Hessian inverse in Influence Functions, we select the hyperparameter values using suggestions from Koh and Liang (2017). The values are selected to ensure that the Taylor expansion in the estimator converges. These hyperparameter values for our experiments are available in the function set_if_params() in the file utils.py of the accompanying codebase.

B.2 Baseline Implementation Details
One of the baselines in Section 4 of the main paper is the Direct-Del and Direct-Add attack from (Zhang et al., 2019a). The original study evaluated the method for the neighbourhood of subject of the target triple. We extend it for both subject and object to ensure fair comparison with other attacks. Since no public implementation is available, we implement our own.  The Direct-Add attack is based on computing a perturbation score for all possible candidate additions. Since the search space for candidate additions is of the order E × R (where E and R are the set of entities and relations), it uses random down sampling to filter out the candidates. The percent of triples down sampled are not reported in the original paper and a public implementation is not available. So, in this paper, we pick a high and a low value for the percentage of triples to be down-sampled and generate adversarial additions for both fractions. We arbitrarily choose 20% of all candidate additions for high; and 5% of all candidate additions as low.
Thus, we generate two poisoned datasets from the attack -one that used a high number of candidates and another that used a low number of candidates. We train two separate KGE models on these datasets to assess the baseline performance. Table 8 shows the MRR of the original model; and poisoned KGE models from attack with high and low down-sampling percents. The results reported for Direct-Add in Section 4 of the main paper are the better of the two results (which show more degradation in performance) for each combination.

C.1 Runtime Analysis
We analyze the runtime efficiency of baseline and proposed attack methods for adversarial deletions. For brevity, we consider the attacks on DistMult model, but the results on other models show similar time scales.  We see that the Instance Similarity metrics (dot metric, 2 metric, cos metric) are more efficient than the state-of-art attacks (Direct-Del, CRIAGE and GR). Furthermore, the 2 metric is almost as quick as random triple selection. The efficiency of Gradient Similarity metrics is also better than or equivalent to CRIAGE and GR.
Only the attack method based on IF is much slower than any other method. This is because estimating the Hessian inverse in IF requires one training epoch for every target triple, that is, we run 100 training epochs to get the influential triples for 100 target triples. However, our results in Section 4.2 of the main paper show that this expensive computation does not provide improved adversarial deletions, and thus, might be unnecessary to select influential triples for KGE models.

C.2 Additional Comparison with CRIAGE
The baseline attack method CRIAGE estimates the influence of a training triple using the BCE loss and is thus likely to be effective only for KGE models that are trained with BCE loss. In Section 4.1, we found that the proposed attacks are more effective than the baseline attack.
But since our original models are trained with cross-entropy loss, we perform an additional analysis of the Instance Similarity attacks against CRIAGE for the DistMult model trained with BCE loss. Table 11 shows the reduction in MRR and Hits@1 due to adversarial deletions in this training setting. We find that the Instance Similarity attacks outperform the baseline for this setting as well.

C.3 Analysis of Instance Attribution Methods on WN18RR-TransE
For the TransE model on WN18RR, we found that the instance attribution methods lead to effective adversarial deletions with respect to random baselines, but not adversarial additions (Section 4.1 of main paper). A possible reason is based on the ability of TransE model hierarchical relations, i.e. the relations that represent a hierarchy between the subject and object entities. For example, (s, _has_part, o) indicates that s is the parent node for o in a hierarchy. We select the Instance Similarity method cos metric for further analysis. It performs the best of all instance attribution methods for adversarial deletions, but performs worse than random neighbourhood edits for adversarial additions. Table 10 shows the relations in the target triples and the influential triples (i.e. adversarial deletions) selected by cos metric.
We see that the target triples contain mostly hierarchical relations like _synset_domain_topic_of and _has_part. Also the cos metric identifies influential triples with same relations. And since our adversarial additions are only based on modifying the entity in the influential triple, these edits improve the hierarchy structure of the graph instead of breaking it. Thus, these edits perform well for adversarial deletions, but not for additions.

C.4 Neighbourhood Sparsity Comparison on WN18RR and FB15k-237
In Section 4.3 of the main paper, we found that the proposed attacks are significantly more effective for WN18RR than for FB15k-237. This is likely because there are fewer triples in the neighbourhood of target triples for WN18RR than for FB15k-237.
The graph in Figure 2 shows the median number of neighbours of the target triples for WN18RR and FB15k-237. We report median (instead of mean) because of large standard deviation in the number of target triple neighbours for FB15k-237. We see that the target triple's neighbourhood for WN18RR is significantly sparser than the neighbourhood for FB15k-237. Thus, since the KGE model predictions are learned from fewer triples for WN18RR, it is also easier to perturb these results with fewer adversarial edits.