Direct Fact Retrieval from Knowledge Graphs without Entity Linking

There has been a surge of interest in utilizing Knowledge Graphs (KGs) for various natural language processing/understanding tasks. The conventional mechanism to retrieve facts in KGs usually involves three steps: entity span detection, entity disambiguation, and relation classification. However, this approach requires additional labels for training each of the three subcomponents in addition to pairs of input texts and facts, and also may accumulate errors propagated from failures in previous steps. To tackle these limitations, we propose a simple knowledge retrieval framework, which directly retrieves facts from the KGs given the input text based on their representational similarities, which we refer to as Direct Fact Retrieval (DiFaR). Specifically, we first embed all facts in KGs onto a dense embedding space by using a language model trained by only pairs of input texts and facts, and then provide the nearest facts in response to the input text. Since the fact, consisting of only two entities and one relation, has little context to encode, we propose to further refine ranks of top-k retrieved facts with a reranker that contextualizes the input text and the fact jointly. We validate our DiFaR framework on multiple fact retrieval tasks, showing that it significantly outperforms relevant baselines that use the three-step approach.


Introduction
Knowledge graphs (KGs) Vrandecic and Krötzsch, 2014;Lehmann et al., 2015), which consist of a set of facts represented in the form of a (head entity, relation, tail entity) triplet, can store a large amount of world knowledge. In natural language applications, language models (LMs) (Devlin et al., 2019;Brown et al., 2020) are commonly used; however, their knowledge internalized in parameters is often incomplete, inaccurate, and outdated. Therefore, several recent * Work done while interning at Amazon. Corresponding author: Jinheon Baek (jinheon.baek@kaist.ac.kr) works suggest augmenting LMs with facts from KGs, for example, in question answering (Oguz et al., 2022;Ma et al., 2022) and dialogue generation (Galetzka et al., 2021;Kang et al., 2022b). However, despite the broad applications of the KGs, the existing mechanism for retrieving facts from them are, in many cases, unnecessarily complex. In particular, to retrieve facts from KGs, existing work (Fu et al., 2020;Lan et al., 2021; relies on three sequential steps, consisting of span detection, entity disambiguation, and relation classification, as illustrated in Figure 1a. For example, given an input text: "Where was Michael Phelps born?", they first detect a span of an entity within the input, which corresponds to "Michael Phelps". Then, they match the entity mention in the input to an entity id in the KG. Those two steps are often called entity linking. Finally, among 91 relations associated with the entity of Michael Phelps, they select one relation relevant to the input, namely "place of birth". The aforementioned approach has a couple of drawbacks. First, all three sub-modules in the existing pipeline require module-specific labels in addition to query-triplet pairs for training. However, in real-world, high-quality training data is limited, and annotating them requires significant costs. Second, such a pipeline approach is prone to error propagation across steps (Singh et al., 2020;Han et al., 2020). For example, if the span detection fails, the subsequent steps, such as relation classification, are likely to make incorrect predictions as well. Third, certain modules, that match entities in queries to KGs or predict relations over KGs, are usually not generalizable to emerging entities and relations and cannot be applied to different KGs. It would be preferable to have a method that does not require KG-specific training and inference.
To tackle these limitations, we propose to directly retrieve the relevant triplets related to a natural language query by computing their similarities over a shared representation space (see Figure 1b). The design of our direct retrieval framework is motivated by a pioneering work of open-domain question answering with documents (Karpukhin et al., 2020), which showed the possibility of dense retrieval with simple vector similarities between the question and document embeddings. However, in contrast to the document retrieval scenario where documents have sufficient contexts to embed, it is unclear whether the LM can still effectively embed facts represented in the short triplet form for retrieval. Also, compared to the document retrieval which additionally requires a reader to extract only the relevant piece of knowledge, our fact retriever itself can directly provide the relevant knowledge.
To realize our fact retriever, we train it by maximizing similarities between representations of relevant pairs of input texts and triplets while minimizing irrelevant pairs, where we use LMs for encoding them. We note that this process requires only text-triplet pairs without using extra labels, unlike the conventional pipeline approach for fact retrieval. After training, we index all triplets in the KG with the trained encoder in an offline manner, and, given the input query, we return the nearest triplets over the embedding space. This procedure simplifies the conventional three steps for retrieving facts from KGs into one. Hereat, to further efficiently search the relevant triplets, we approximate the similarity calculation with vector quantization and hierarchical search based on clustering (Johnson et al., 2021). We further note that, since we embed triplets using the LM, our retriever can generalize to different KGs without any modification, unlike some conventional retrieval systems that require additional training to learn new KG schema about distinct entities and relations types. We refer to our framework as Direct Fact Retrieval (DiFaR).
We experimentally demonstrate that our direct retrieval on KGs works well; however, the fact represented in the triplet form has a limited context, since it consists of only two entities and one relation. Also, similarity calculation with the independently represented input text and triplets is arguably simple, and might be less effective. Therefore, to further improve the retriever performance, we additionally use a reranker, whose goal is to calibrate the ranks of retrieved triplets for the input text. In particular, we first retrieve k nearest facts with the direct retriever, and then use another LM which directly measures the similarity by encoding the input text and the triplet simultaneously. Moreover, another objective of the reranker is to filter out irrlevant triplets, which are the most confusing ones in the embedding space of the direct retriever. Therefore, to effectively filter them, we train the reranker to minimize similarities between the input text and the most nearest yet irrelevant triplets.
We evaluate our DiFaR framework on fact retrieval tasks across two different domains of question answering and dialogue, whose goals are to retrieve relevant triplets in response to the given query. The experimental results show that our Di-FaR framework outperforms relevant baselines that use conventional pipeline approaches to retrieve facts on KGs, and also show that our reranking strategy significantly improves retrieval performances. The detailed analyses further support the efficacy of our DiFaR framework, with its great simplicity.
Our contributions in this work are as follows: • We present a novel direct fact retrieval (Di-FaR) framework from KGs, which leverages only the representational similarities between the query and triplets, simplifying the conventional three steps: entity detection, disambiguation, and relation classification, into one.
• We further propose a reranking strategy, to tackle a limitation of little context in facts, for direct knowledge retrieval, which is trained with samples confused by the direct retriever.
• We validate our DiFaR on fact retrieval tasks, showing that it significantly outperforms baselines on unsupervised and supervised setups.

Background and Related Work
Knowledge Graphs Knowledge Graphs (KGs) are factual knowledge sources Vrandecic and Krötzsch, 2014), containing a large number of facts, represented in a symbolic triplet form: (head entity, relation, tail entity).
Since some natural language applications require factual knowledge (Schneider et al., 2022), existing literature proposes to use knowledge in KGs, and sometimes along with language models (LMs) (Devlin et al., 2019). To mention a few, in question answering domains, facts in KGs can directly be answers for knowledge graph question answering tasks (Lukovnikov et al., 2017;Chakraborty et al., 2019), but also they are often augmented to LMs to generate knowledge-grounded answers (Zhang et al., 2019;Kang et al., 2022a). Similarly, in dialogue generation, some existing work augments LMs with facts from KGs (Galetzka et al., 2021;Kang et al., 2022b). However, prior to utilizing facts in KGs, fact retrieval -selection of facts relevant to the input context -should be done in advance, whose results substantially affect downstream performances. In this work, we propose a conceptually simple yet effective framework for fact retrieval, motivated by information retrieval.

Information Retrieval
The goal of most information retrieval work is to retrieve relevant documents in response to a query (e.g., question). Early work relies on term-based matching algorithms, which count lexical overlaps between the query and documents, such as TF-IDF and BM25 (Robertson et al., 1994;Robertson and Zaragoza, 2009). However, they are vulnerable to a vocabulary mismatch problem, where semantically relevant documents are lexically different from queries (Nogueira et al., 2019;Jeong et al., 2021). Due to such the issue, recently proposed work instead uses LMs (Devlin et al., 2019; to encode queries and documents, and uses their representational similarities over a latent space (Karpukhin et al., 2020;Xiong et al., 2021;Qu et al., 2021). They suggest their huge successes are due to the effectiveness of LMs in embedding documents. However, they focus on lengthy documents having extensive context, and it is unclear whether LMs can still effectively represent each fact, succinctly represented with two entities and one relation in the triplet form, for its retrieval. In this work, we explore this new direction by formulating the fact retrieval problem as the information retrieval problem done for documents.
Knowledge Retrieval from KGs Since KGs have a large number of facts, it is important to bring only the relevant piece of knowledge given an input query. To do so, one traditional approach uses neural semantic parsing-based methods Dong and Lapata, 2016;Bao et al., 2016;Luo et al., 2018) aiming to translate natural language inputs into logical query languages, such as SPARQL 1 and λ-DCS (Liang, 2013), executable over KGs. However, they have limitations in requiring additional labels and an understanding of logical forms of queries. Another approach is to use a pipeline (Bordes et al., 2014;Hao et al., 2017;Mohammed et al., 2018; consisting of three subtasks: entity span detection, entity disambiguation, and relation classification. However, they similarly require additional labels on training each subcomponent, and this pipeline approach suffers from errors that are propagated from previous steps (Singh et al., 2020;Han et al., 2020). While recent work (Oguz et al., 2022) proposes to retrieve textual triplets from KGs based on their representational similarities to the input text with the information retrieval mechanism, they still rely on entity linking (e.g., span detection and entity disambiguation) first, thus identically having limitations of the pipeline approach. Another recent work (Ma et al., 2022) merges a set of facts associated with each entity into a document and performs document-level retrieval. However, the document retrieval itself can be regarded as entity linking, and also the overall pipeline requires an additional reader to extract only the relevant entity in retrieved documents. In contrast to them, we directly retrieve facts from the input query based on their representational similarities, which simplifies the conventional three-step approach including entity linking into one single retrieval step.

Preliminaries
We formally define a KG and introduce a conventional mechanism for retrieving facts from the KG.
Knowledge Graphs Let E be a set of entities and R be a set of relations. Then, one particular fact is defined as a triplet: t = (e h , r, e t ) ∈ E × R × E, where e h and e t are head and tail entities, respectively, and r is a relation between them. Also, a knowledge graph (KG) G is defined as a set of fac- Note that this KG is widely used as a useful knowledge source for many natural language applications, including question answering and dialogue generation (Oguz et al., 2022;Ma et al., 2022;Galetzka et al., 2021;Kang et al., 2022b). However, the conventional mechanism to access facts in KGs is largely complex, which may hinder its broad applications, which we describe in the next paragraph.

Conventional Knowledge Graph Retrieval
The input of most natural language tasks is represented as a sequence of tokens: Suppose that, given the input x, t + is a target triplet to retrieve 2 . Then, the objective of the conventional fact retrieval process for the KG G (Bordes et al., 2014; is, in many cases, formalized as the following three sequential subtasks: where p ψ (m|x) is the model for mention detection with m as the detected entity mention within the input x, p φ (e|m, x) is the model for entity disambiguation, and p θ (t|e, x, G) is the model for relation classification, all of which are individually parameterized by φ, ψ, and θ, respectively. However, there is a couple of limitations in such the three-step approaches. First, they are vulnerable to the accumulation of errors, since, for example, if the first two steps consisting of span detection and entity disambiguation are wrong and we are ending up with the incorrect entity irrelevant to the given query, we cannot find the relevant triplet in the final relation prediction stage. Second, due to their decomposed structures, three sub-modules are difficult to train in an end-to-end fashion, while requiring labels for training each sub-module. For example, to train p ψ (m|x) that aims to predict the mention boundary of the entity within the input text, they additionally require annotated pairs of the input text and its entity mentions: {(x, m)}. Finally, certain modules are usually limited to predicting entities E and relations R specific to the particular KG schema, observed during training. Therefore, they are not directly applicable to unseen entities and relations, but also to different KGs.

Direct Knowledge Graph Retrieval
To tackle the aforementioned challenges of the existing fact retrieval approaches on KGs, we present the direct knowledge retrieval framework. In particular, our objective is simply formulated with the single sentence encoder model E θ without introducing extra variables (e.g., m and e), as follows: where f is a non-parametric scoring function that calculates the similarity between the input text representation E θ (x) and the triplet representation E θ (t), for example, by using the dot product. Note that, in Equation 2, we use the sentence encoder E θ to represent the triplet t. To do so, we first symbolize the triplet as a sequence of tokens: t = [w 1 , w 2 , . . . , w |t| ], which is constructed by entity and relation tokens, and the separation token (i.e., a special token, [SEP]) between them. Then, we simply forward the triplet tokens to E θ to obtain the triplet representation. While we use the single model for encoding both input queries and triplets, we might alternatively represent them with different encoders, which we leave as future work.
Training After formalizing the goal of our direct knowledge retrieval framework in Equation 2, the next step is to construct the training samples and the optimization objective to train the model (i.e., E θ ). According to Equation 2, the goal of our model is to minimize distances between the input text and its relevant triplets over an embedding space, while minimizing distances of irrelevant pairs. Therefore, following the existing dense retrieval work for documents (Karpukhin et al., 2020), we use a contrastive loss as our objective to generate an effective representation space, formalized as follows: where τ contains a set of pairs between the input text and all triplets in the same batch. In other words, (x, t+) ∈ τ is the positive pair to maximize the similarity, whereas, others are negative pairs to minimize. Also, exp(·) is an exponential function.
Inference During the inference stage, given the input text x, the model should return the relevant triplets, whose embeddings are closest to the input text embedding. Note that, since E θ (x) and E θ (t) in Equation 2 are decomposable, to efficiently do that, we represent and index all triplets in an offline manner. Note that, we use the FAISS library (Johnson et al., 2021) for triplet indexing and similarity calculation, since it provides the extremely efficient search logic, also known to be applicable to billions of dense vectors; therefore, suitable for our fact retrieval from KGs. Moreover, to further reduce the search cost, we use the approximated neighborhood search algorithm, namely Hierarchical Navigable Small World Search with Scalar Quantizer. This mechanism not only quantizes the dense vectors to reduce the memory footprint, but also builds the hierarchical graph structures to efficiently find the nearest neighborhoods with few explorations. We term our Direct Fact Retrieval method as DiFaR.

Reranking for Accurate Fact Retrieval
The fact retrieval framework outlined in Section 3.2 simplifies the conventional three subtasks used to access the knowledge into the single retrieval step. However, contrary to the document retrieval case, the fact is represented with the most compact triplet form, which consists of only two entities and one relation. Therefore, it might be suboptimal to rely on the similarity, calculated by the independently represented input text and triplets as in Equation 2. Also, it is significantly important to find the correct triplet within the small k (e.g., k = 1) of the top-k retrieved triplets, since, considering the scenario of augmenting LMs with facts, forwarding several triplets to LMs yields huge computational costs.
To tackle such challenges, we propose to further calibrate the ranks of the retrieved triplets from our DiFaR framework. Specifically, we first obtain the k nearest facts in response to the input query over the embedding space, by using the direct retrieval mechanism defined in Section 3.2. Then, we use another LM, E φ , that returns the similarity score of the pair of the input text and the retrieved triplet by encoding them simultaneously, unlike the fact retrieval in Equation 2. In other words, we first concatenate the token sequences of the input text and the triplet: [x, t], where [·] is the concatenation operation, and then forward it to E φ ([x, t]). By doing so, the reranking model E φ can effectively consider token-level relationships between two inputs (i.e., input queries and triplets), which leads to accurate calibration of the ranks of retrieved triplets from DiFaR, especially for the top-k ranks with small k.
For training, similar to the objective of DiFaR defined in Section 3.2, we aim to maximize the similarities of positive pairs: {(x, t + )}, while minimizing the similarities of irrelevant pairs: {(x, t)}\ {(x, t + )}. To do so, we use a binary cross-entropy loss. However, contrary to the previous negative sampling strategy defined in Section 3.2 where we randomly sample the negative pairs, in this reranker training, we additionally manipulate them by using the initial retrieval results from our DiFaR. The intuition here is that the irrelevant triplets, included in the k nearest neighbors to the input query, are the most confusing examples, which are yet not filtered by the DiFaR model. Hereat, the goal of the reranking strategy is to further filter them by refining the ranks of the k retrieved triplets; therefore, to achieve this goal, we include them as the negative samples during reranker training. Formally, let τ = (x,t) is a set of pairs of the input query x and its k nearest facts retrieved from DiFaR. Then, the negative samples for the reranker are defined by excluding the positive pairs, formalized as follows: τ \ {(x, t + )}. Note that constructing the negative samples with retrieval at every training iteration is costly; therefore, we create them at intervals of several epochs (e.g., ten), but also we use only a subset of triplets in KGs during retrieval. Our proposed framework with the reranking strategy is referred to as Direct Fact Retrieval with Reranking (DiFaR 2 ).

Experimental Setups
We explain datasets, models, metrics, and implementations. For additional details, see Appendix A.

Datasets
We validate our Direct Fact Retrieval (DiFaR) on fact retrieval tasks, whose goal is to retrieve relevant triplets over KGs given the query. We use four datasets on question answering and dialogue tasks.
Question Answering The objective of KG-based question answering (QA) tasks is to predict factual triplets in response to the given question, where predicted triplets are direct answers. For this task, we use three datasets, namely SimpleQuestions (Bordes et al., 2015), WebQuestionsSP (WebQSP) , and Mintaka (Sen et al., 2022). Note that SimpleQuestions and We-bQSP are designed with the Freebase KG , ad Mintaka is designed with the Wikidata KG (Vrandecic and Krötzsch, 2014).
Dialogue In addition to QA, we evaluate our Di-FaR on KG-based dialogue generation, whose one subtask is to retrieve relevant triplets on the KG that provides factual knowledge to respond to a

Baselines and Our Models
We compare our DiFaR framework against other relevant baselines that involve subtasks, such as entity detection, disambiguation, and relation prediction. Note that most existing fact retrieval work either uses labeled entities in queries, or uses additional labels for training subcomponents; therefore, they are not comparable to DiFAR that uses only pairs of input texts and relevant triplets. For evaluations, we include models categorized as follows: Retrieval with Entity Linking: It predicts relations over candidate triplets associated with identified entities by the entity linking methods, namely spaCy (Honnibal et al., 2020), GENRE (De Cao et al., 2021), BLINK , and ReFinED (Ayoola et al., 2022) for Wikidata; GrailQA (Gu et al., 2021) for Freebase.
Factoid QA by Retrieval: It retrieves entities and relations independently based on their similarities with the input query (Lukovnikov et al., 2017).
Our Models: Our Direct Knowledge Retrieval (DiFaR) directly retrieves the nearest triplets to the input text on the latent space. DiFaR with Reranking (DiFaR 2 ) is also ours, which includes a reranker to calibrate retrieved results.
Retrieval with Gold Entities: It uses labeled entities in inputs and retrieves triplets based on their associated triplets. It is incomparable to others.

Evaluation Metrics
We measure the retrieval performances of models with standard ranking metrics, which are calculated by ranks of correctly retrieved triplets. In particular, we use Hits@K which measures whether retrieved Top-K triplets include a correct answer or not, and Mean Reciprocal Rank (MRR) which measures the rank of the first correct triplet for each input text and then computes the average of reciprocal ranks of all results. Following exiting document retrieval work (Xiong et al., 2021;Jeong et al., 2022), we consider top-1000 retrieved triplets when calculating MRR, since considering ranks of all triplets in KGs are computationally prohibitive.

Implementation Details
We use a distilbert 3 as a retriever for all models, and a lightweight MiniLM model 4 as a reranker, both of which are pre-trained with the MSMARCO dataset (Nguyen et al., 2016). During reranking, we sample top-100 triplets retrieved from DiFaR. We use off-the-shelf models for unsupervised settings, and further train them for supervised settings.

Experimental Results and Analyses
Main Results We first conduct experiments on question answering domains, and report the results in Table 1. As shown in Table 1, our DiFaR with Reranking (DiFaR 2 ) framework significantly outperforms all baselines on all datasets across both unsupervised and supervised experimental settings with large margins. Also, we further experiment on dialogue domain, and report results in Table 2. As shown in Table 2, similar to the results on QA domains, our DiFaR 2 framework outperforms the relevant baselines substantially. These results on two different domains demonstrate that our DiFaR 2 framework is highly effective in fact retrieval tasks. To see the performance gains from our reranking strategy, we compare the performances between our model variants: DiFaR and DiFaR 2 . As shown in Table 1 and Table 2, compared to DiFaR, DiFaR 2 including the reranker brings huge performance improvements, especially on the challenging datasets: Mintaka and OpenDialKG. However, we consistently observe that our DiFaR itself can also show superior performances against all baselines except for the model of Factoid QA by Retrieval on the SimpleQuestions dataset. The inferior performance of our DiFaR on this SimpleQuestions dataset is because, its samples are automatically constructed from facts in KGs; therefore, it is extremely simple to extract entities and predict relations in response to the input query. On the other hand, our DiFaR framework sometimes outperforms the incomparable model: Retrieval with Gold Entities, which uses the labeled entities in the input queries. This is because this model is restricted to retrieve the facts that should be associated with entities in input queries; meanwhile, our DiFaR is not limited to query entities thanks to the direct retrieval scheme.
Analyses on Zero-Shot Generalization Our Di-FaR can be generalizable to different datasets with the same KG, but also to ones with other KGs without any modifications. This is because it retrieves triplets based on their text-level similarities to input queries and does not leverage particular  schema of entities and relations, unlike the existing entity linking methods. To demonstrate them, we perform experiments on zero-shot transfer learning, where we use the model, trained on the We-bQSP dataset with the Wikidata KG, to different datasets with the same KG and also to ones with the different Freebase KG. As shown in Table 3, our DiFaR frameworks are effectively generalizable to different datasets and KGs; meanwhile, the pipeline approaches involving entity linking are not generalizable to different KGs, and inferior to ours.
Analyses on Single-and Multi-Hops To see whether our DiFaR frameworks can also perform challenging multi-hop retrieval that requires selecting triplets not directly associated with entities in input queries, we breakdown the performances by single-and multi-hop type queries. As shown in Figure 2, our DiFaR can directly retrieve relevant triplets regardless of whether they are associated with entities in input queries (single-hop) or not (multi-hop), since it does not rely on entities in queries for fact retrieval. Also, we observe that our reranking strategy brings huge performance gains, especially on multi-hop type queries. However, due to the intrinsic complexity of multi-hop retrieval, its performances are relatively lower than performances in single-hop cases. Therefore, despite the fact that the majority of queries are answerable with single-hop retrieval and that our DiFaR can handle multi-hop queries, it is valuable to further extend  : Performances and efficiencies of our DiFaR 2 with varying K, where we change the number of Top-K retrieved triplets when leveraging the reranking mechanism. We report results with the relative improvement (%) to our DiFaR without reranking. We report the time with average over 30 runs.
the model for multi-hop, which we leave as future work. We also provide examples of facts retrieved by our DiFaR framework in Table 4. As shown in Table 4, since LMs, that is used for encoding both the question and the triplets for retrieval, might learn background knowledge about them during pre-trainnig, our DiFaR framework can directly retrieve relevant triplets even for complex questions. For instance, in the first example of Table 4, the LM already knows who was the us president in 1963, and directly retrieves whose religion. Additionally, we provide more retrieval examples of our DiFaR framework in Appendix B.2 with Table 6 for both single-and multi-hop questions.
Analyses on Reranking with Varying K While we show huge performance improvements with our reranking strategy in Table 1 and Table 2, its performances and efficiencies depend on the number of retrieved Top-K triplets. Therefore, to further analyze it, we vary the number of K, and report the performances and efficiencies in Figure 3. As shown in Figure 3, the performances are rapidly increasing until Top-10 and saturated after it. Also, the time for reranking is linearly increasing when we increase the K values, and, in Top-10, the reranking mechanism takes only less than 20% time required for the initial retrieval. These results suggest that it might be beneficial to set the K value as around 10.

Sensitivity Analyses on Architectures
To see different architectures of retrievers and rerankers make how many differences in performances, we  Figure 4: Entity linking results, where we measure the performances on benchmark datasets with Wikidata and Freebase KGs. Note that entity mentions of the SimpleQuestions dataset are not available; therefore, we cannot fine-tune existing entity linkers, which additionally require mention labels, unlike ours. perform sensitivity analyses by varying their backbones. We use available models in the huggingface model library 5 . As shown in Table 5, we observe that the pre-trained backbones by the MSMARCO dataset (Nguyen et al., 2016) show superior performances compared to using the naive backbones, namely DistilBERT and MiniLM, on both retrievers and rerankers. Also, performance differences between models with the same pre-trained dataset (e.g., MSMARCO-TAS-B and MSMARCO-Distil) are marginal. These two results suggest that the knowledge required for document retrieval is also beneficial to fact retrieval, and that DiFaR frameworks are robust across different backbones.
Analyses on Entity Linking While our DiFaR framework is not explicitly trained to predict entity mentions in the input query and their ids in the KG, during the training of our DiFaR, it might learn the knowledge on matching the input text to its entities. To demonstrate it, we measure entity linking performances by checking whether the retrieved triplets contain the labeled entities in the input query. As shown in Figure 4, our DiFaR surprisingly outperforms entity linking models. This might be because there are no accumulation of errors in entity linking steps, which are previously done with mention detection and entity disambiguation, thanks to direct retrieval with end-to-end learning; but also the fact in the triplet form has more beneficial information to retrieve contrary to the entity retrieval.
In this work, we focused on the limitations of the conventional fact retrieval pipeline, usually consisting of entity mention detection, entity disambiguation and relation classification, which not only requires additional labels for training each subcomponent but also is vulnerable to the error propagation across submodules. To this end, we proposed the extremely simple Direct Fact Retrieval (DiFaR) framework. During training, it requires only pairs of input texts and relevant triplets, while, in inference, it directly retrieves relevant triplets based on their representational similarities to the given query. Further, to calibrate the ranks of retrieved triplets, we proposed to use a reranker. We demonstrated that our DiFaR outperforms existing fact retrieval baselines despite its great simplicity, but also ours with the reranking strategy significantly improves the performances; for the first time, we revealed that fact retrieval can be easily yet effectively done. We believe our work paves new avenues for fact retrieval, which leads to various follow-up work.

Limitations
In this section, we faithfully discuss the current limitations and potential avenues for future research.
First of all, while one advantage of our Direct Fact Retrieval (DiFaR) is its simplicity, this model architecture is arguably simple and might be less effective in handling very complex queries (Sen et al., 2022). For example, as shown in Figure 2, even though our DiFaR framework can handle the input queries demanding multi-hop retrieval, the performances on such queries are far from perfect. Therefore, future work may improve DiFaR by including more advanced techniques, for example, further traversing over the KG based on the retrieved facts from our DiFaR. Also, while we use only the text-based similarities between queries and triplets with LMs, it is interesting to model triplets over KGs based on their graph structures and blend their representations with representations from LMs to generate more effective search space.
Also, we focus on retrieval datasets in English. Here we would like to note that, in fact retrieval, most datasets are annotated in English, and, based on this, most existing work evaluates model performances on English samples. However, handling samples in various languages is an important yet challenging problem, and, as future work, one may extend our DiFaR to multilingual settings.

Ethics Statement
For an input query, our Direct Fact Retrieval (Di-FaR) framework enables the direct retrieval of the factual knowledge from knowledge graphs (KGs), simplifying the conventional pipeline approach consisting of entity detection, entity disambiguation, and relation classification. However, the performance of our DiFaR framework is still not perfect, and it may retrieve incorrect triplets in response to given queries. Therefore, for the high-risk domains, such as biomedicine, our DiFaR should be carefully used, and it might be required to analyze retrieved facts before making the critical decision.