Grape: Knowledge Graph Enhanced Passage Reader for Open-domain Question Answering

A common thread of open-domain question answering (QA) models employs a retriever-reader pipeline that first retrieves a handful of relevant passages from Wikipedia and then peruses the passages to produce an answer. However, even state-of-the-art readers fail to capture the complex relationships between entities appearing in questions and retrieved passages, leading to answers that contradict the facts. In light of this, we propose a novel knowledge Graph enhanced passage reader, namely Grape, to improve the reader performance for open-domain QA. Specifically, for each pair of question and retrieved passage, we first construct a localized bipartite graph, attributed to entity embeddings extracted from the intermediate layer of the reader model. Then, a graph neural network learns relational knowledge while fusing graph and contextual representations into the hidden states of the reader model. Experiments on three open-domain QA benchmarks show Grape can improve the state-of-the-art performance by up to 2.2 exact match score with a negligible overhead increase, with the same retriever and retrieved passages. Our code is publicly available at https://github.com/jumxglhf/GRAPE.


Introduction
Open-domain question answering (QA) tasks aim to answer questions in natural language based on large-scale unstructured passages such as Wikipedia (Chen and Yih, 2020;Zhu et al., 2021).A common thread of modern open-domain QA models employs a retriever-reader pipeline, in which a retriever aims to retrieve a handful of relevant passages w.r.t. a given question, and a reader aims to infer a final answer from the received passages (Guu et al., 2020;Karpukhin et al., 2020;Lewis et al., 2020;Izacard and Grave, 2021).Although these methods have achieved remarkable advances on various open-domain QA benchmarks, the state-of-the-art readers, such as FiD (Izacard and Grave, 2021), still often produce answers that contradict the facts.As shown in Figure 1, the FiD reader fails to produce correct answers due to inaccurate understanding of the factual evidence.Therefore, instead of improving the retrievers to saturate the readers with higher answer coverage in the retrieved passages (Yu et al., 2021;Oguz et al., 2022;Yu et al., 2022a), in this work, we aim at improving the readers by leveraging structured factual triples from the knowledge graph (KG).
A knowledge graph, such as Wikidata (Vrandečić and Krötzsch, 2014), contains rich relational information between entities, many of which can be further mapped to corresponding mentions in questions and retrieved passages.To verify the possible improvements brought by the KG, we conduct a simple analysis to examine the percentage of related fact triples present on the KG in the data, i.e., entities in questions are neighbors of answer entities in retrieved passages through any relation.
We also wonder how many of the above examples are correctly answered by state-of-the-art readers.
Table 1 shows that a great portion of examples (e.g., 58.1% in WebQ) can be matched to related fact triplets on the KG.However, without using the KG, FiD frequently produces incorrect answers to questions on these subsets, leaving us significant room for improvement.Therefore, a framework that leverages not only textual information in retrieved passages but also fact triplets from the KG is urgently desired to improve reader performance.
In this paper, we propose a novel knowledge Graph enhanced passage reader, namely GRAPE, to improve the reader performance for opendomain QA.Considering the enormous size of KGs and complex interweaving between entities (e.g., over 5 million entities and over 30 neighbors per entity on Wikidata), direct reasoning on the entire graph is intractable.Thus, we first construct a localized bipartite graph for each pair of question and passage, where nodes represent entities contained within them, and edges represent relationships between entities.Then, node representations are initialized with the hidden states of the corresponding entities, extracted from the intermediate layer of the reader model.Next, a graph neural network learns node representations with relational knowledge, and passes them back into the hidden states of the reader model.Through this carefully curated design, GRAPE takes into account both aspects of knowledge as a holistic framework.
To the best of our knowledge, we are the first work to leverage knowledge graphs to enhance the passage reader for open-domain QA.Our experiments demonstrate that, given the same retriever and the same set of retrieved passages, GRAPE can achieve superior performance on three opendomain QA benchmarks (i.e., NQ, TriviaQA, and WebQ) with up to 2.2 improvement on the exact match score over the state-of-the-art readers.In particular, our proposed GRAPE nearly doubles the improvement gain on the subset that can be enhanced by fact triplets on the KG.

Related Work
Text-based open-domain QA Mainstream opendomain QA models employ a retriever-reader architecture, and recent follow-up work has mainly focused on improving the retriever or the reader (Chen and Yih, 2020;Zhu et al., 2021).For the retriever, most of them split text paragraphs on Wikipedia pages into over 20 million disjoint chunks of 100 words, each of which is called a passage.Traditional methods such as TF-IDF and BM25 explore sparse retrieval strategies by matching the overlapping contents between questions and passages (Chen et al., 2017;Yang et al., 2019).DPR (Karpukhin et al., 2020) revolutionized the field by utilizing dense contextualized vectors for passage indexing.Furthermore, other research improved the performance by better training strategies (Qu et al., 2021), passage re-ranking (Mao et al., 2021) or directly generating passages (Yu et al., 2022a).Whereas for the reader, extractive readers aimed to locate a span of words in the retrieved passages as answer (Karpukhin et al., 2020;Iyer et al., 2021;Guu et al., 2020).On the other hand, FiD and RAG, current state-of-the-art readers, leveraged encoder-decoder models such as T5 to generate answers (Lewis et al., 2020;Izacard and Grave, 2021).Nevertheless, these readers only used text corpus, failing to capture the complex relationships between entities, and hence resulting in produced answers contradicting the facts.

KG-enhanced methods for open-domain QA
Recent work has explored incorporating knowledge graphs (KGs) into the retriever-reader pipeline for open-domain QA (Min et al., 2019;Zhou et al., 2020;Oguz et al., 2022;Yu et al., 2021;Hu et al., 2022;Yu et al., 2022b).For example, Unik-QA converted structured KG triples and merged unstructured text together into a unified index, so the retrieved evidence has more knowledge covered.Graph-Retriever (Min et al., 2019) and GNNencoder (Liu et al., 2022) explored passage-level KG relations for better passage retrieval.KAQA (Zhou et al., 2020) improved passage retrieval by re-ranking according to KG relations between candidate passages.KG-FiD (Yu et al., 2021) utilized KG relations to re-rank retrieved passages by a KG fine-grained filter.However, all of these retrieverenhanced methods focused on improving the qual-ity of retrieved passages before passing them to the reader model.So, they still suffered from factual errors.Instead, our GRAPE is the first work to leverage knowledge graphs to enhance the reader, which is orthogonal to these existing KG-enhanced frameworks and our experiments demonstrate that with the same retriever and the same set of retrieved passages, GRAPE can outperform the state-of-theart reader FiD by a large margin.

Proposed Method: GRAPE
In this section, we elaborate on the details of the proposed GRAPE.Figure 3 shows its overall architecture.GRAPE adopts a retriever-reader pipeline.Specifically, given a question, it first utilizes DPR to retrieve top-k relevant passages from Wikipedia ( §3.1).Then, to peruse the retrieved passages, it constructs a localized bipartite graph for each pair of question and passage ( §3.2.1).The constructed graphs possess tractable yet rich knowledge about the facts among connected entities.Finally, with the curated graphs, structured facts are learned through a relation-aware graph neural network (GNN) and fused into token-level representations of entities in the passages ( §3.2.2).

Passage Retrieval
Given a collection of K passages, the goal of the retriever is to map all the passages in a lowdimensional vector, such that it can efficiently retrieve the top-k passages relevant to the input question.Note that K can be very large (e.g., over 20 million in our experiments) and k is usually small (e.g., 100 in our experiments).
Following DPR (Karpukhin et al., 2020), we employ two independent BERT (Devlin et al., 2019) models to encode the question and the passage separately, and estimate their relevance by computing a single similarity score between their [CLS] token representations.Specifically, given a question q and a passage p i ∈ {p 1 , p 2 , ..., p K }, we encode q by a question encoder E Q (•) : q → R d and encodes p i by a passage encoder E P (•) : p → R d , where d is the hidden dimension of the used BERT.The ranking score r i q of p i w.r.t q is calculated as: We select k passages whose ranking scores r q are top-k highest among all K passages.Before passing the retrieved passages into the reader model,  we process each question and passage by inserting special tokens before each entity.For entities in each passage, we use the special token <P ENT >; for those in the question, we use another special token <Q ENT >, as shown in Figure 2. The special tokens play an important role in our proposed reader model, which is illustrated in more detail in §3.2.2.

Graph Construction
Given the retrieved and processed passages, our proposed GRAPE utilizes the factual triplets from KGs to construct localized bipartite graphs for each question-passage pair.A KG is defined as a set of triplets KG = {(e h , r, e t )}, where e h , e t , and r refer to a head entity, a tail entity, and a corresponding relation between them, respectively.Knowledge graphs represent facts in the simple format of triplets, which can easily be leveraged to enrich our knowledge.Taking the question-passage pair in Figure 2 as an example, without any prior knowledge about the authorship of the ballets, the selection of answers between "Marius Petipa", "Lev Ivanov" and "Tchaikovsky" is difficult.Nonetheless, factual triplets from the KG show that these three ballets are only "possessed by spirit" by "Marius Petipa" and "Lev Ivanov".And their "composer" relations with "Tchaikovsky" make the answer obvious.By fusing such relational facts from KG triplets, the reader can better comprehend the concrete facts between involved entities and hence improve the performance for open-domain QA.
One naive solution could be fetching a sub-graph from the KG where all entities involved in the questions and the passages are included.While such design preserves all potentially relevant information, it suffers from dimensionality and noise issues.Therefore, we proposed to construct a localized bipartite graph for each question-passage pair, where only relational facts on relevant entities are kept.That is, in order to prune noisy peripheral relations, only the factual relations between question entities and passage entities are included in the localized bipartite graph.Let a bipartite graph be denoted as G = (U, V, E), where U and V are two disjoint sets of nodes, and E is the edge set containing edges that connect nodes from U to V, or vice versa.Specifically, in GRAPE, U and V are defined as the entity nodes in the question and the retrieved passage, respectively.

Factual Relation Fusion
In this section, we illustrate how the proposed GRAPE fuses structured knowledge from our constructed localized bipartite graphs into the reader.
GRAPE uses FiD (Izacard and Grave, 2021) as the backbone architecture, which utilizes a T5 (Raffel et al., 2019) for encoding and decoding.To answer a question q, the input consists of k retrieved documents {doc 1 , doc 2 , • • • , doc k }, where doc i denotes to the concatenation of the token sequence of q and the token sequence of i-th retrieved passage p i .Specifically, , where t and o are the length of the question and the passage sequence, respectively1 .Given doc i , G i denotes the localized bipartite graph constructed from it.And I s (doc i ), I e (doc i ), and I t (doc i ) denote the indices of the start, end, and special tokens of all entities in doc i , respectively.
To fuse the relational knowledge from our constructed graphs, we split the encoder Enc(•) : doc → R (t+o)×d of the reader (i.e., the encoder of T5) into two partitions Enc top (•) and Enc bot (•).The bottom part Enc bot (•) contains the first L layers of Enc(•) and the top part Enc bot (•) contains the rest, where L is a hyper-parameter.Given doc i , Enc bot (•) delivers its encoded intermediate hidden states H b i ∈ R (t+o)×d , formulated as: We then extract the node attributes to the span of its corresponding entity.For each entity node, its attribute vector is the average of the corresponding tokens' representations.Formally, where denotes the vertical concatenation.We use a relation-aware graph neural network (GNN) to conduct relation-aware message passing on the constructed graph G i with attributes and the learning process is formulated as: where the learned node representations H G i contain relational knowledge extracted from the KG as well as contextualized knowledge from the encoder.For the coherence of reading, the details of GNN(•, •) are described later in this subsection.
With the learned entity node representations H G i containing knowledge from the fact relations, we leverage the special tokens to fuse them back into the reader.Specifically, we have where [•] is the indexing operation.The updated contextualized representations H u i are then used as the input of the top part of the encoder to enable further information exchanges among regular tokens and the updated special tokens: Given the question q, GRAPE forwards all k retrieved documents through the above-described encoding process, and acquires the hidden states of all documents {H i } k i=1 .These hidden states are then concatenated and sent to the decoder Dec(•) for answer generation.Formally, To sum up, the workflow of our proposed GRAPE can be concluded as the following four steps: (i) get the initial contextualized representations via Enc bot (Equation ( 2)) and the node attributes (Equation (3)), (ii) fuse fact relation by a relation-aware GNN (Equations ( 4) and ( 5)), (iii) exchange additional information via Enc top (Equation ( 6)), and (iv) generate the answer by the decoder (Equation ( 7)).
Relation-aware GNN Here we elaborate the details of the aforementioned GNN(•, •).Typically, each GNN layer (Hamilton et al., 2017;Kipf and Welling, 2017) can be formulated as where N (v) is the set of neighbors for node v including itself, n is the index of the current layer, and h n v denotes the representation of node v at the n-th layer.The transform function TRANS(•) projects node representations from the previous layer to a new vector space for message passing.The aggregation function AGG(•) takes a set of node representations and aggregates them as a vector in a unified view (Kipf and Welling, 2017;Veličković et al., 2018;Zhang et al., 2019;Fan et al., 2022;Ju et al., 2022).Our proposed GRAPE uses a multi-layer perceptron as TRANS(•) in each layer.That is, where A (n) is the intermediate embedding to be used by AGG(•), H denote the learnable parameters, and σ(•) refers to the non-linear activation function.
For the aggregation function AGG(•), we explore a relation-aware attention mechanism.Different from GAT (Veličković et al., 2018) that considers only node representations for the edge attention weight, GRAPE also incorporates the relation representations between nodes.At layer n, for each node v, its representation h where a calculates the importance score of node u to node v, considering contextualized representations of the connected two nodes and language model's understanding of their relationship (i.e., avg(Enc(r u,v )) 2 .We further extend this schema to the multi-head attention pipeline by having multiple operations as described in Equation (10) running in parallel.That is, where M denotes the number of heads, and H (n,m)  refers to the learned representations of the m-th head at n-th layer.Finally, the node representations are used as the output of GNN(•, •), where N is the number of layers in the GNN.
In summary, our relation-aware GNN combines the current reader's understanding of the factual relationships among nodes (i.e., avg(Enc(r)) with the intermediate hidden states X G i from Enc bot (•).Enriched by structured fact relations, entity node representations are then fused back into the reader's encoder so that our GRAPE can comprehend facts between entities during the encoding process.

Experiments
In this section, we conduct comprehensive experiments on three community-acknowledged public open-domain QA benchmarks: Natural Questions (NQ) based on Google search queries, TriviaQA based on questions from trivia and quiz-league websites, and Web Questions (WebQ) based on questions from Google Suggest API (Kwiatkowski et al., 2019;Joshi et al., 2017;Berant et al., 2013).We explore the same train / dev / test splits and preprocessing techniques as used by (Izacard and Grave, 2021;Karpukhin et al., 2020).

Experimental Setup
Retrieval Corpus We followed the same process as used in (Karpukhin et al., 2020;Lewis et al., 2020) for preprocessing Wikipedia pages.We split each Wikipedia page into disjoint 100-word passages, resulting in 21 million passages in total.As for the knowledge graph used to construct our localized bipartite graphs, we used English Wikidata (Vrandečić and Krötzsch, 2014).The total number of aligned entities, relations, and triplets on Wikidata is 2.7M, 630, and 14M respectively 3 .We used ELQ (Li et al., 2020) to identify mentions in the question and retrieved passages, and link them to corresponding entities on Wikidata.

Implementation Details
In GRAPE, involved hyper-parameters are the number of retrieved passages k, the number of GNN layers N , the number 3 The Wikipedia and Wikidata were all collected in December of 2019.We only used the most visited top 1M entities.
of GNN head M , and the encoder layer index L, where Enc top and Enc bot are partitioned).We explore k = 100, N = 2, L = 3 and M = 8 as the default setup.The hidden dimension of the GNN in GRAPE are set to the dimension of its language model (i.e., d = 768 for the base configuration and d = 1024 for the large configuration).Since N = 2 and M = 8 are the standard values for most GNNs with the attention mechanism, able to capture 2-hop neighbor information while being stable (Veličković et al., 2018), we simply follow the same principle.
For the encoder layer index L, we search the optimal value in the range of {3, 4, 6, 8, 9}.According to our experiment, we observe that different selections of L don't have much impact on the performance, indicating that the infusion of structured knowledge doesn't correlate with the contextualization of the entity embedding.However, we do observe a faster convergence rate for lower L values.So for faster convergence, we set L to 3. Other hyper-parameter selections related to training with best performance across all datasets are shown in Table 5 in §A.1.The software and hardware information can be found in §A.1 in the appendix.

Evaluation Metrics
We use the standard evaluation metric for open-domain QA: exact match score (EM) (Rajpurkar et al., 2016;Zhu et al., 2021).An answer is considered correct if and only if its normalized form4 has a match in the acceptable answer list.For all experiments, we conduct 3 runs with different random seeds and report the average.

Baseline Models
We compare GRAPE with four groups of baselines: (i) The first group includes closed-book models, where no Wikipedia document is provided during training and inference: T5-11B (Raffel et al., 2019) and GPT-3 (Brown et al., 2020).(ii) The second contains extractive models, which utilize passages extracted by enhanced retrievers and find the span of the answer: DPR (Karpukhin et al., 2020), RIDER (Mao et al., 2021), RECONSIDER (Iyer et al., 2021).(iii) The third includes approaches that utilize KG for retrieving: Graph-Retriever (Min et al., 2019), Path-Retriever (Asai et al., 2019), and KAQA (Zhou et al., 2020).(iv) The last group contains advanced generative readers: RAG (Lewis et al., 2020), REALM (Guu et al., 2020) (Sachan et al., 2021) and FiD (Izacard and Grave, 2021).For FiD, we compare both its base and large versions, and for other baselines we compare their best-performing versions.We note that our method is the first work using the knowledge graph to improve the reader performance for open-domain QA.Hence, this is orthogonal to existing works using the knowledge graph to improve passage retrieval (Liu et al., 2022) or re-ranking (Yu et al., 2021).Our experiments show that with the same retriever and the same set of retrieved passages, GRAPE can outperform the state-of-the-art reader FiD by a large margin.

Comparison with Baselines
Table 2 shows the model performance of 13 baselines as well as our GRAPE.We can observe that our proposed GRAPE can significantly outperform the best performing baselines across all datasets over both base and large configurations.Specifically, GRAPE improves FiD by 0.5, 1.2, and 1.6 EM score on the base model, and 2.1, 2.2, and 1.2 EM scores on the large model on NQ, TriviaQA and WebQ, respectively.Albeit being competitive on all datasets, GRAPE brings more improvements on TriviaQA and WebQ than NQ.We believe the reason is that on NQ, the percentage of questions favorable from factual KG relations among all questions, as shown in Table 1 and Table 4, is relatively lower, compared to TriviaQA and WebQ.On the large configuration, even though some questions are not directly favored by structured facts, the additional information re-routing from GRAPE still benefits with additional learning capability, which outperfoms FiD large for 2.1 EM score.We also note that the performance of RAG on WebQ is better than FiD base and close to GRAPE base, which is caused by a tremendous amount of additional training on other open-domain QA datasets.

Ablation Study
We design two variants for our GRAPE.The first is GRAPE without considering relations between entity nodes, i.e., avg(Enc(r u,v )) in Equation ( 10) is deleted.The goal is to validate the improvements brought by relational knowledge, which is denoted as w/o Rel in Table 3.The second is GRAPE without considering the relations as well as neighbor differences.The goal is to validate if GRAPE can differentiate the important neighbor without the attention mechanism, which is denoted as w/o Att.As shown in  by the factual relations between entities.Besides, we also notice that the incorporation of relation is more important than the attention mechanism.

Improvement Analysis by KG Relation
Since GRAPE utilizes the factual relational knowledge from KG, to validate the legitimacy of our assumption, we analyze the performance gain on the subset of questions that can be directly solved by a factual triplet on KG (i.e., constructed graphs for these questions contain at least one edge that links the answer entity to entities in questions).
From Table 4, we observe that GRAPE significantly improves performance on this subset that factual relations from KG naturally favors.For example, on the base model, GRAPE improves the overall performance by 0.5, 1.2, and 1.6 EM score respectively on these three datasets, and almost doubles the performance margin (i.e., 2.6, 3.8, and 3.7) on their subsets.This phenomenon demonstrates that GRAPE tends to utilize the inductive bias we introduce through graphs and the major performance gain can be rooted in the factual relational knowledge from KG, which further validates the legitimacy of GRAPE.

Scaling with Number of Passages
We further evaluate the performance of GRAPE with respect to the different numbers of retrieved passages (i.e., D = {5, 10, 25, 50, 100}), as shown in Figure 4. We observe that given the same number of passages, GRAPE consistently outperforms FiD, with greater performance gains given more passages.Specifically, GRAPE performs on par with FiD with only the half amount of retrieved passages, starting from 25 retrieved passages.This phenomenon demonstrates that, when the answer is well presented in the retrieved passages, facts introduced by our curated graphs constructed from KG significantly help the reader answer questions.

Case Studies on KG Relations
To further validate the improvement gain induced by GRAPE, we analyze samples that are incorrectly answered by FiD but correctly answered by GRAPE, and visualize the constructed graphs for these samples, as shown in Figure 5. From these samples, we can observe the performance gain indeed comes from the strong enhancement brought by fact relations from their constructed graphs.For example, in the first example, with the fact relations, GRAPE understands that "Arges" is a member of "Cyclopes", which perfectly enhances answering for the given problem.In the the third example, we can observe that FiD delivers a answer that is only partially correct.Whereas, enhanced by fact relation from KG, GRAPE correctly answers the question, because of the triplet ("UK", "applies to jurisdiction", "Parliament of UK").This design is tractable yet effective.Because only entities highly correlated with useful facts will be included.Specifically, passage entities unrelated to question entities are very likely to be marginal and hence removed, and only the factual triplets helpful for answering the problems are kept.Relations within passage entities are most likely peripheral and hence neglected in the bipartite graph.

Conclusion
In this work, we study the problem of open-domain QA.We discover that state-of-the-art readers fail to capture the complex relationships between entities appearing in questions and retrieved passages, resulting in produced answers that contradict the facts.To this end, we propose a novel knowledge Graph enhanced passage Reader (GRAPE) to improve the reader performance for open-domain QA.Specifically, for each pair of question and retrieved passage, we construct an informative localized bipartite graph and explore an expressive relationaware GNN to learn entity representations that contain contextual knowledge from passages as well as fact relations from the KG.Experiments on three open-domain QA benchmarks show that GRAPE significantly outperforms state-of-the-art readers by up to 2.2 exact match score.In the future, we plan to enrich the structured information contained in our graphs from other external resources.

Figure 1 :
Figure 1: The answers produced by the SoTA reader FiD contradict the facts in the knowledge graph.

Figure 2 :
Figure 2: Given a pair of question and passage, the proposed GRAPE constructs a localized bipartite graph.

Figure 3 :
Figure3: Two documents are independently encoded by our GRAPE with their corresponding localized bipartite graphs, leveraging both textual and structured information.The relation-aware GNN learns the structured knowledge from the localized bipartite graphs, attributed with entity representations extracted from the T5-Encoder Bot .The node representations are then fused into the T5-Encoder Top , which provides the hidden representations of the document.Finally, the T5-Decoder takes hidden states from all documents and generates the answer.

Figure 4 :
Figure 4: The performance on the test set of TriviaQA w.r.t. the number of passages.

Figure 5 :
Figure 5: Case studies on samples that are incorrectly answered by FiD but correctly answered by GRAPE.Relations in green arrow indicate the factual relations from KG that enhance the question answering.

Table 1 :
The error rate of state-of-the-art reader (i.e., FiD base) on the subset of data examples in the test set that have related fact triplets on the knowledge graph.
<Q ENT > Swan lake, <Q ENT > the sleeping beauty and <Q ENT > the nutcracker are three famous ballet by? 'The Nutcracker' is an 1892 two-act ballet, originally choreographed by <P ENT > Marius Petipa and <P ENT > Lev Ivanov with a score by <P ENT > Pyotr Ilyich Tchaikovsky (Op.71).The libretto is adapted from E. T. A. Hoffmann's story "The Nutcracker and the Mouse King"… Question: ∅, where R denotes the set of all relation types on KG.Isolated nodes without any neighbors are removed from the graph.An example graph is shown in Figure2, and Table6(in the appendix) shows the statistics of the constructed graphs.
There exists an bi-directional edge (e h , e t ) between e h ∈ U and e t ∈ V if and only if {(e h , r h,t , e t ) : r h,t ∈ R, (e h , r h,t , e t ) or (e t , r t,h , e h ) ∈ KG} ̸ =

Table 2 :
Exact match scores over the test sets of Natural Questions, TriviaQA and Web Questions.We put the training details (such as learning rate, batch size, dev performance, etc) corresponding to the performance of our GRAPE in Table5.Numbers in parenthesis are improvements of GRAPE over the corresponding best-performing baseline.that ⋆ means model is warmed with external training data from Natural Questions.

Table 3 :
Table3, we can observe that the performance drops when removing any of the two mechanisms, demonstrating their effectiveness and further validating the rich inductive bias brought Ablation study of GRAPE without relation knowledge or attention mechanism.The first line refers to the performance on the base model; whereas the second line refers to the large model.

Table 4 :
Exact match score on the subset of questions that can be enhanced by factual triplets from the KG.

Table 5 :
GRAPE only solves errors from fact-related examples.Besides, GRAPE explores fact relations from Wikidata, and hence we might omit fact relations from other sources such as Freebase.Best training hyper-parameters for results reported in Table2.

Table 6 :
Statistics of our constructed graphs for all datasets.Mean and standard deviation are calculated for each attribute.100 passages are retrieved for each questions.