KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction

We present a novel method for relation extraction (RE) from a single sentence, mapping the sentence and two given entities to a canonical fact in a knowledge graph (KG). Especially in this presumed sentential RE setting, the context of a single sentence is often sparse. This paper introduces the KGPool method to address this sparsity, dynamically expanding the context with additional facts from the KG. It learns the representation of these facts (entity alias, entity descriptions, etc.) using neural methods, supplementing the sentential context. Unlike existing methods that statically use all expanded facts, KGPool conditions this expansion on the sentence. We study the efficacy of KGPool by evaluating it with different neural models and KGs (Wikidata and NYT Freebase). Our experimental evaluation on standard datasets shows that by feeding the KGPool representation into a Graph Neural Network, the overall method is significantly more accurate than state-of-the-art methods.


Introduction
Knowledge graphs (KGs) are the foundation for many downstream applications and are growing ever larger. However, due to the sheer volume of knowledge and the world's dynamic nature where new entities emerge and unknown facts about them are learned, KGs need to be continuously updated. Distantly supervised Relation Extraction (RE) is an important KG completion task aiming at finding a semantic relationship between two entities annotated on the unstructured text with respect to an underlying knowledge graph (Ye and Ling, 2019). In the literature, researchers mainly studied two variants in the RE: 1) multi-instance RE and 2) sentential RE. The multi-instance RE assumes that in a given bag of sentences, if two entities participate in a relation, there exists at least one sentence with these two entities, which wd:Q266569 wd:Q568631 wdt:P106 "Animator" "occupation" "Marc Davis" " American artist and animator" Here, entity aliases do not play any role in the understanding of the sentence for finding the KG relation. a a sentence taken from (Sorokin and Gurevych, 2017) may contain the target relation (Riedel et al., 2010;Vashishth et al., 2018). In this setting, researchers aim to incorporate contextual signals from the previous occurrences of an entity pair into the neural models to support relation extraction (Ye and Ling, 2019;Xu and Barbosa, 2019;Wu et al., 2019). In contrast, sentential RE restricts the scope of document context only to the input sentence (disregards other occurrences of entity pairs) while predicting the KG relation (Sorokin and Gurevych, 2017). Hence, sentential RE makes the multi-instance setting more difficult by limiting the available context.
Recent approaches for RE not only base KGs for relation inventory but also consider it for extending contextual knowledge for further improvement of RE task (Vashishth et al., 2018;Bastos et al., 2021). A few multi-instance RE methods rely on entity attributes (properties) such as descriptions, aliases, and types (as additional context) along with entity pair occurrences from previous sentences to improve the overall extraction quality (Ji et al., 2017;Vashishth et al., 2018). For the sentential RE, the RECON approach (Bastos et al., 2021) aims to effectively encode KG context derived from the entity attributes and entity neighborhood triples. RECON employs a Graph Neural Network (GNN) as a context aggregator for combining sentential context (annotated entities and sentence) and structured KG representation. Although the additional KG context has a positive effect on the overall relation extraction in multiinstance and sentential RE settings, not all KG context forms are necessary for every input sentence. Consider Figure 1, where the task is to infer a semantic relation 'occupation' between two entities wd:Q568631 (Marc Davis) 1 and wd:Q266569 (animator). Wikidata (Vrandecic, 2012) KG provides semantic information such as description, instance-of, and aliases about entities. Here, the entity alias (Marc Fraser Davis, Fraser Davis for wd:Q568631; and cartoonist for wd:Q266569) has no impact on understanding the sentence because the entities are explicitly mentioned in the sentence. Furthermore, there is empirical evidence in the literature that for several sentences, statically adding all KG context offered minimal or negative impact (Bastos et al., 2021). Hence, there are open research questions as to how an RE approach can dynamically utilize the sufficient context from KG and whether or not the selected KG context positively impacts the overall performance?
This paper studies these concerning questions proposing the KGPool approach. KGPool utilizes a self-attention mechanism in a Graph Convolution Network (GCN) (Kipf and Welling, 2017) for selecting a sub-graph from the KG to extend the sentential context. The concept of dynamically mapping the structural representation of a KG to a latent representation of a sentence has not been widely studied in prior literature. In RE, KGPool is the initial attempt. The existing approaches (Bastos et al., 2021;Xu and Barbosa, 2019;Wu et al., 2019;Vashishth et al., 2018) feed all the available context (either derived from a bag of sentences or a KG or both) into a neural model and relied on the model to figure out the consequences, resulting in limited performance in many cases (Bastos et al., 2021). Conversely, we study the efficacy of KGPool in dynamically choosing KG context for the sentential RE task using two standard community datasets (NYT Freebase (Riedel et al., 2010), and Wikidata (Sorokin 1 wd: binds to https://www.wikidata.org/wiki/ and Gurevych, 2017)). Our work makes the following key contributions: • The KGPool approach dynamically selects structural knowledge and transform it into a representation suitable to supplement the latent representation of sentential context learned using a neural model. We deduce that KGPool is the first approach that works independently of the underlying context aggregators used in the literature (Graph Neural Network (Zhu et al., 2019) or LSTM-based (Sorokin and Gurevych, 2017)). • We are the first to map the task of KG Context Selection to a Graph Pooling Problem. Therefore, our proposed approach legitimizes the application of graph pooling algorithms for choosing the relevant context. • KGPool, paired with a GNN as context aggregator, significantly outperforms the existing baselines on both datasets, in one experiment increasing the precision by 12 points over to baseline (P@30 on NYT Freebase). Furthermore, our empirical results (cf., Table 3) conclude that an LSTM model paired with KGPool is able to notably outperform a GNN-based approach (Zhu et al., 2019) and nearly all multi-instance baselines (Ye and Ling, 2019; Wu et al., 2019;Vashishth et al., 2018) published in the recent years.
This paper is structured as follows: Section 2 reviews the related work. Section 3 formalizes the problem and the proposed approach is described in Section 4. Section 5 describes the experimental setup. The results are in Section 6. We conclude in Section 7.

Related Work
Multi-instance RE: a few multi-instance RE approaches use convolution neural network (dos Santos et al., 2015), attention CNN (Wang et al., 2016) and attention-based recurrent neural models for relation extraction (Zhou et al., 2016). Other approaches such as (Ji et al., 2017;Vashishth et al., 2018) incorporate entity descriptions, entity and relation aliases from KG to supplement context from the previous sentences. Work in (Vashishth et al., 2018) employs a graph convolution network to encode entity and relation aliases derived from Wikidata. HRERE (Xu and Barbosa, 2019) proposes an approach for jointly learning sentence and KG representation using cross-entropy loss function. To effectively capture the available entity context in the documents, Ye and Ling (2019) suggest an approach incorporating intra-bag and inter-bag attentions. For a detailed survey, we point readers to (Smirnova and Cudré-Mauroux, 2018). Sentential RE: researchers (Sorokin and Gurevych, 2017) utilized additional relations present in the sentence to assist the process of extracting the target relation using an LSTM-based model. GP-GNN (Zhu et al., 2019) generates parameters of GNN based on the input sentence, which enables GNNs to process-relational reasoning on unstructured text inputs. RECON (Bastos et al., 2021) is an approach that uses the entity attributes (aliases, labels, descriptions, instanceof) and KG triples to signal an underlying GNN model for sentential RE. Authors conclude that the multi-instance requirement can be relaxed provided a good representation of KG context to enrich the sentential RE model. However, RECON and multi-instance approaches (Xu and Barbosa, 2019;Vashishth et al., 2018) utilize statically derived context from the KG, i.e., KG context does not vary depending the sentence.
Graph Pooling and Dynamic Context Selection: researchers proposed several models for the graph classification aka. graph pooling task (Cangea et al., 2018;Ying et al., 2018;Gao and Ji, 2019). These models employ various approaches such as graph topology-based (Rhee et al., 2018), and by learning the hierarchical graph-structure (Ying et al., 2018). Another graph pooling model relies on node features and topological information using self-attention  in which a specific number of nodes are always eliminated. In KGPool, the elimination of nodes depends on a context coefficient and node importance (Section 4). For context selection, a recent work focuses on dynamically selecting the KG context to optimize a Pre-Trained Language Model (PLM) for entity typing and relation classification (Su et al., 2020). KGPool has the following fundamental differences compared to (Su et al., 2020): KGPool inspires its self-attention mechanism from Vaswani et al., 2017) to learn a representation of the KG context. Hence, KGPool works agnostic of the underlying model used for the context aggregation (unlike Su et al. (2020), which is tightly coupled with PLM). Approaches such as Zhang et al., 2018;Kang et al., 2020) also perform dynamic context selection for respective tasks. However, these approaches are not focused on knowledge graph context selection.

Problem Statement
We define the KG as a tuple given by KG = (E, R, T + ) where E denotes the set of all vertices in the graph representing entities, R is the set of edges representing relations, and T + ⊆ E × R × E is a set of all KG triples. The RE Task predicts the target relation r c ∈ R between a given pair of entities e i , e j from the sentence W = (w 1 , w 2 , ..., w l ). If no relation is inferred, it returns 'NA' label. We aim for the sentential RE task which put a constraint that the sentence within which a given pair of entities occurs is the only visible sentence from the bag of sentences. We view RE as a classification task similar to (Sorokin and Gurevych, 2017). In a KG triple τ = (e h , r, e t ) ∈ T + , the relation r ∈ R, e h is the head entity (relation origin) while e t is the tail entity. For each entity, associated semantic properties such as entity label, description, instance-of, and aliases are known as entity attribute (At e ) (cf., graph construction step of Figure 2). We aim to model KG contextual information to improve the classification. This is achieved by learning the effective representations of the sets At e , e h , e t , and W (cf. section 4).

KGPool Approach
KGPool consists of three components illustrated in Figure 2: 1) Graph Construction aggregates the sentence, entities and its attributes as a Heterogeneous Information Graph (HIG) for input representation. 2) Context Pooling step utilizes a self-attention mechanism in a graph convolution to calculate attention scores for entity attributes using node features and graph topology. The pooling process allows KGPool to construct a Context Graph (CG), which is a contextualized representation of HIG with lesser number of nodes. 3) Context Aggregator takes as input the sentence, entities, contextual representations of HIG, and classifies the target relation between the entities. We detail the approach in the following.

Graph Construction
As first step, we extract entity attributes (At e ) from public dumps of Freebase (Bollacker et Figure 2: KGPool approach has three components to supplement sentential context with necessary KG context. (Vrandecic, 2012) KGs depending on the dataset. For the KG context, we rely on the most commonly available properties of the entities as suggested by (Bastos et al., 2021): aliases, description, instance-of, and label. An example of various entity attributes is given in Figure  2 at the Graph Construction step. Then, the sentence W is transformed to another representation using Bi-LSTM (Schuster and Paliwal, 1997) by concatenating its word and character embeddings:

2007) and Wikidata
Similar representation is created for each entity e i where i = (h, t): For entity e i , its KG contexts (entity attributes) At e i j (where j = [0...N ]) are independently converted into associated embedding representations: For a knowledge representation of the KG context concerning the sentential context (sentence and annotated entities), we introduce the special graph HIG = (A, F ), a Heterogeneous Informa-tion Graph, represented in the adjacency matrix A ∈ {0, 1} n×n , where n is the maximum number of neighboring nodes for an entity e i . Here, F ∈ R n×f is the node feature matrix assuming each node has f features learned from the Bi-LSTM in the equations 1, 2, and 3. In these equations, BERT (Devlin et al., 2019), or any other recent Transformer-based model can be used. Due to hardware limitations, we are bound to Bi-LSTM using Glove embeddings (Pennington et al., 2014).

Context Pooling
Context pooling is built upon three layers of Graph Convolutional Networks (GCN) and a readout layer associated with each of them. Moreover, the last layer of GCN is coupled with a pooling layer (cf., ablation studies for architectural design choice experiments).

Graph Convolution
Since KGPool is expected to select the sufficient context, the Context Graph CG is a reduction of HIG using the mapping Ψ : HIG −→ CG. The challenge here is the no natural notion of spatial locality, i.e., it is not viable to pool together all context nodes in an "m × m" patch on HIG be-cause the complex topological structure of graphs prevents any straightforward, deterministic definition of a "patch". Furthermore, entities nodes have a varying number of neighboring nodes, making the graph pooling challenging (similar to other graph classification problems (Ying et al., 2018)).
In HIG, entity nodes do not contain information of their neighbors. Hence, we aim to enrich each entity node with the adjacent node's contextual information. Therefore, we employ a GNN variant to utilize its message-passing architecture to learn node embeddings from a message propagation function. The message propagation function depends on the adjacency matrix A, trainable parameters θ, and the node embeddings F (Ying et al., 2018). We rely on a GCN model by Kipf and Welling (2017). The GCN layer is defined as: The GCN module might run k iteration and normally is in the range of two to six (Ying et al., 2018). A few graph representation learning approaches propose to use readout layer that aggregates node features to learn a fixed size representation (Xu et al., 2018;Cangea et al., 2018). We perform this summarizing after each block of the network (Equation 4), and aggregate all of the intermediate representation together by taking their sum. We define readout layer R as: where N is the number of nodes in the graph and F is the node feature embedding.

KG Self-Attention Mask
Until Equation 5, KGPool focuses on learning node features. Next, KGPool learns the importance of each entity attribute node using selfattention. Please note, in HIG, pooling happens only for entity attribute nodes ( At j e i from Equation 3). The sentence W and entities e h , e t remain intact. Hence, each entity representation e h and e t is enriched by the useful attribute context (KG context). The entity attribute nodes which do not provide relevant context are excluded from the graph. To choose the relevant entity attribute nodes, we use a self-attention score Z (Lee et al., 2019) calculated as follows: where Θ att ∈ R F ×1 is the only parameter of the pooling layer. For ranking, we take the attention score and pass it through a softmax layer where Z score is the normalized self attention score.
After Equation 7, we compute the scores for each entity attribute node. Next, we propose a node selection method in which nodes are selected on the basis of Context Coefficient α which is a hyper parameter. The top nodes are selected as: where σ(Z score ) is the standard deviation of Z score , idx represents the node selection result, and Z mask is the corresponding attention mask. Equation 8 acts as a soft constraint in selecting the context nodes for each HIG which depends on the value of α. Learning α during training may cause over-fitting. Hence, we decided to consider α as a trade-off parameter similar to λ in regularization (Bühlmann and Van De Geer, 2011). Next, the Context Graph (CG) is formed by pooling out the less essential entity attribute nodes as: In addition to the dynamically selected nodes, we also inherit the intermediate node and graph representations of k − 1 layers similar to ResNET (He et al., 2016). The intermediate representations (k −1 ) and the CG (k th layer) is given as follows: where in the i th layer: F (i) e l is the node embedding of e l , l = (h, t), F (i) W is the node embedding of sentence W, and R (i) is the readout. In the k th layer, F (k) is the F out from Equation 9. The ⊕ is the concatenation among the vectors.

Context Aggregator
Finally, KGPool combines the latent representation (sentential context) with the structured representation learned in Equation 10. As such, we employ a model M which learns latent relation vec-tor r . In the state-of-the-art approaches that use KG context, the representation of r is learned using sentential and all static KG context (Vashishth et al., 2018;Bastos et al., 2021). However, in KG-Pool, relation r is realized based on the sentential context and dynamically chosen KG context. Hence, we employ context aggregators similar to the baselines (section 5.3) for jointly learning the enriched KG context in the form of CG and sentential context. The final relation is:

Datasets
We consider two standard datasets (English version): Wikidata dataset (Sorokin and Gurevych, 2017) and NYT Freebase (Riedel et al., 2010). Both datasets were annotated using distant supervision (associated stats are in Table 1). Datasets include 'NA' as one of the target relations.

KGPool Configurations
KGPool is configured with two context aggregator modules. We inherit context aggregators from existing sentential RE baselines. Our experimental aim is to assess as how KGPool performs along with the state-of-the-art context aggregators (comparative study). Our two settings are: 1. KGP ool +lstm : KGPool is coupled with a context aware LSTM model from (Sorokin and Gurevych, 2017) as context aggregator.

Baseline Models
We consider the recent sentential RE approaches for our empirical study: RECON (Bastos et al., 2021): induces KG context (entity attributes and 1&2 hop entity triples) along with the sentence in a GNN.

Metrics and Hyper-parameters
Graph Construction step (section 4.1) use a Bi-LSTM with one hidden layer of size 50 (Bastos et al., 2021). The word embedding dimension is 50 initialized by Glove embeddings (Pennington et al., 2014). The context pooling parameters are from . For model M , we used the default parameters provided by the authors (Zhu et al., 2019;Sorokin and Gurevych, 2017). For brevity, details are in the appendix. Metric and Optimization: Our experiment settings are borrowed from (Bastos et al., 2021). Hence, on Wikidata dataset, we use (micro) precision (P), recall (R), and F-score (F1). On the NYT Freebase dataset, (micro) P@10 and P@30 is reported. P@K here represents precision at K percent recall. We also study the effect of Context Coefficient (α) for both KGPool configurations (trained end-to-end). We ignore the probability predicted for the NA relation during testing. We employ the Adam optimizer (Kingma and Ba, 2015) with categorical cross entropy loss where each model is run three times on the whole training set. For the P/R curves (with best α values of KGPool variants), the result from the first run of each model is selected. For ablation, we use the McNemar's test for statistical significance to find if the reduction in error in the KGPool configura- tions are significant. The differences in the models is statistically significant if the p − value < 0.05 (Dietterich, 1998). We release all experiment code and data on a public GitHub 2 .

Results
We conduct our experiments and analysis in response to the question RQ: "What is the efficacy of KGPool in dynamically selecting the KG context for the sentential RE task?" As such, we also compare KGPool against approaches that do not dynamically treat the context.   can observe that even when the available context is limited to entity attributes, the KGP ool +gnn variant surpasses RECON that also contains context from 1&2 hop triples besides the entity attributes. RECON-EAC and KGP ool +gnn rely on entity attributes as KG context with the same context aggregator. When KGP ool +gnn variants choose KG context dynamically, they perform better than RECON-EAC. It is interesting to notice that when an LSTM model is fed with the dynamically chosen context, the performance gain is more than ten absolute points (KGP ool +lstm Vs Context-LSTM), even outperforming GP-GNN. Performance on NYT Freebase Dataset: Similar to the Wikidata dataset, the KGP ool +gnn variants significantly outperform all baselines (cf.   keeps the precision higher over a more extended recall range. For both datasets, KGPool configurations (KGP ool +gnn and KGP ool +lstm ) have the best-reported performance varying as per the α. This validates our choice to introduce a soft constraint in selecting the context nodes (cf., Equation 8). The P/R curves in Figure 3 show that KGPool performs better than baselines over the entire recall range. We conclude that the effective dynamic context selection by KGPool has a positive impact on the sentential RE task (which successfully answers our research question).

Ablation Studies
We conducted two ablation studies to understand the behavior of KGPool configurations: Significance of Dynamic Context Selection: we perform McNemar's test for the best KGPool configuration against the previous sentential state-ofthe-art (i.e. RECON). The results in Table 4 are statistically significant on both datasets, illustrating KGPool's robustness. Although KGP ool +gnn variants achieve statistically significant results against RECON, there exist several sentences for which our approach is unable to select supplementary KG context ((RW ) values in the contingency table). It requires further investigation, and we plan it for our future work. Effect on the Degree of Nodes for Entities: for studying the effect of context pooling (Section 4.2), we also conducted a study to understand the impact of KGPool on the reduction of the average degree of entity nodes (e i ) in the HIG. Table 5 summarizes the effect of Context Coefficient on the average degree of entity nodes. Irrespective of α, KGPool notably reduces the degree of e i by removing less relevant nodes. Architectural Choice Experiment: In KGPool, we chose to introduce pooling in the last layer of a three-layered architecture (three blocks). To support our choice, we performed several additional experiments by introducing pooling in various layers. We employ the Wikidata dataset for our experiments. We use best configuration of our model ( KGP ool +gnn (α=1)) and created several variants of it. For instance, KGP ool +gnn (P =all) comprises the configuration where we introduce pooling in all three GCN blocks. The configuration KGP ool +gnn ( ¶=2&3) has no pooling in the first layer but has a pooling layer in the remaining two GCN blocks. KGP ool +gnn is the best configuration of KGPool where pooling is just in the final layer. In Table 6, we observe that KGP ool +gnn with pooling only in the last GCN block has the superior performance compared to other two variants. Here, the first two layers are used to learn the node features, which are then employed with self-attention for node selection. Our experiments justify the architectural choice decision. However, with a newer graph pooling technique, such decisions will solely depend on the performance of the approach, and we can not generalize the results of these experiments.  Table 6: When we introduce pooling in all three layers or in two layers, the performance of KGPool's variants drop. Hence, it justify our choice to add pooling only in the third layer that gives the best performance (values in bold). We use best configuration of our model (KGP ool +gnn (α=1)).

Case-Studies:
To understand the KGPool's performance gain, we report a few top relations in Table 7. It can be observed from this table that in a few cases, with lesser context, KGPool can perform significantly better. In the next case study, to understand the KGPool's performance while adding additional context (more noise), we induce extra context in the form of 1&2-hop triples along with entity attributes. For the same, we considered KGPool's best configurations on the Wikidata dataset. The configurations KGP ool +gnn (+T ) and KGP ool +lstm (+T ) represent KGPool fed with additional triple context. For both configurations agnostic of underlying aggregator, we observe a slight increase in performance (Table 8). There are several triples which are the irrelevant source of information not needed for a given sentence. KGPool can remove that information and does not suffer the performance drop due to added noise in the context. Details on error analysis, performance for worst performing individual relations, and on a human-annotated dataset are provided in the appendix.

Relation
KGPool RECON GP-GNN vocal specialization 1.00 0.00 0.00 list of works   Table 8: To scale the sources of the contexts, we induce additional triple context in the KGPool shown as (+T ) configurations. We use best configurations of our model (KGP ool +gnn (α=1) and KGP ool +lstm (α=1)). We observe a slight jump in the performance, however, KGPool is still able to pool irrelevant context.

Discussion and Conclusion
Although KGs are often employed for providing background context in the RE tasks (cf. Section 2), yet there is limited research about defining relevant context. In this work, we proposed KG-Pool and provide a set of experiments proving: 1) Given the limited context that is in individual sentences, dynamically bringing context from KG significantly improves the RE performance. 2) We introduced Context Coefficient (α), which acts as a soft constraint in determining the relevant entity context nodes. 3) Our approach KGPool is invariant of the context aggregator and enables us to learn effective knowledge representation of the required KG context for a given sentential context. Our evaluation concerns several key questions: • Data quality impact on an effective knowledge representation: in spite KGPool's significant performance, there exist several sentences for which our model finds a limitation compared to the baseline (cf. Table 4).
One potential interpretation could be about the noise injected due to the data quality of the KG context (Weichselbraun et al., 2018). Hence, how does the quality of contextual data impact the performance of context selection approaches is an open direction.
• Impact of additional sources of KG context: In ablation, we provide a study by adding 1 & 2-hop triples in addition to entity attributes. There is no significant increase in the performance, although KGPool is able to remove irrelevant context for a given sentence. Furthermore, we did not consider edge features in HIG although KGPool can be extended to support edge features using techniques such as (Simonovsky and Komodakis, 2017). Additional experiments are needed to verify that our empirical observations hold in this setting, and we leave it for future work.
Overall, KGPool provides an effective knowledge representation for set-ups where sentence context is sparse. It is interesting to observe that effective knowledge representation learned using KGPool paired with an LSTM model outperforms GP-GNN (Zhu et al., 2019), and nearly all multiinstance baselines. Our conclusive results open a new research direction: is it possible to apply effective context selection techniques coupled with deep learning models to other downstream NLP tasks? For example, our results can encourage researchers to extend KGPool or develop novel context selection methods for the tasks where KGs have been extensively used as additional background knowledge, such as in entity linking , KG completion (Wang et al., 2020;Shi et al., 2017), and recommendation system .
In this work, we present significant progress in solving sentential RE task. Harvesting knowledge is an essential goal that human beings seek along with the advancement of technology. This research and many RE approaches rely on additional signals from the public KGs to design systems that extract structured knowledge from unstructured contents. When it comes to who may be disadvantaged from this research, we do not think it is applicable since our study of addressing the KG context capabilities is still at an early stage. Having said so, we are fully supporting the development of ethical and responsible AI. The potential bias in the standard public datasets that may lead to wrong knowledge needs to be cleaned or corrected with validation mechanisms.
context. It is worthwhile to mention that in GP-GNN and Context-LSTM, there is only a sentential context. RECON and KGPool use KG context. Still, performance is limited for many relations such as use and different from as reported in the table 10. The lack of quality context in the KG possibly a reason for limited performance for KGcontext-induced models in erroneous cases. Detailed exploration is needed to understand the impact of data quality on KGPool performance, and we leave it for the future work.  Table 9: Effect of Context Pooling. 'DEG' denotes average degree of an entity node (e i ). We observe a reduction in the degree of entity nodes in CG compared to the HIG.

A.2 Effect of Context Pooling
In the main paper, we presented the effect of context pooling on KGPool's best configuration (KGP ool +gnn ). Table 9 describes the reduction in the average degree of nodes for KGP ool +lstm configuration for various context coefficient (α). On both datasets, there is a significant reduction in the degree of nodes. On Wikidata dataset (Sorokin and Gurevych, 2017), KGP ool +lstm with (α=1) reports the highest value among its other configurations. For the same, the average degree of nodes is reduced from 5.33 to 1.06. Please note, the degree of nodes in HIG remains the same. However, for CG, the degree of nodes differs based on the context aggregator. We train the model end to end, and due to back-propagation, context weights adjust as per the context aggregator.

A.3 Results on a Human Annotated Dataset
The employed datasets Wikidata (Sorokin and Gurevych, 2017) and NYT Freebase (Riedel et al., 2010) are created using distant supervision techniques. Considering distant supervision techniques inherit a noise, to provide a comprehensive ablation study, (Zhu et al., 2019) provided a human evaluation setting. Following the same setting, RECON provided human-annotated data  Table 10: Micro F-score of 10-worst performing Relations for KGP ool gnn (α=1) on Wikidata dataset. We also provide corresponding values of other sentential RE baselines. The main reason for limited performance across all models is the scarcity of training data for these relation types.
from Wikidata dataset (Sorokin and Gurevych, 2017). This is to verify that the distantly supervised dataset is correct for every pair of entities. Sentences accepted by all annotators are part of the human-annotated dataset. There are 500 sentences and 1846 triples in the test set. Table 11 reports KGP ool's performance against the sentential baselines. KGP ool +gnn continues to outperform the baselines, maintaining similar behavior as seen on test sets of original datasets. The results further re-assure the robustness of our proposed approach.      For augmenting entity attribute context, we relied on public dumps of Wikidata and Freebase. From these dumps, we automatically extracted entities and its proper-ties: labels, aliases, instance of, descriptions. For Wikidata, we used public API 5 using a SPARQL query and for Freebase, we took original depreciated dump 6 . We use the nltk english tokenizer for splitting the sentence into its corresponding tokens in the Riedel dataset. We do not do any further data preprocessing. We used 1 GPU NVIDIA TITAN X Pascal with 12GB of GPU storage to run our experiments. We train the models upto a maximum of 14 epochs and select the best performing model based on the micro F1 scores of the validation set. The tables 14, 15 and 12 detail the hyperparameter settings used in our experiments. We do not do any further hyper-parameter tuning.