Query2Particles: Knowledge Graph Reasoning with Particle Embeddings

Answering complex logical queries on incomplete knowledge graphs (KGs) with missing edges is a fundamental and important task for knowledge graph reasoning. The query embedding method is proposed to answer these queries by jointly encoding queries and entities to the same embedding space. Then the answer entities are selected according to the similarities between the entity embeddings and the query embedding. As the answers to a complex query are obtained from a combination of logical operations over sub-queries, the embeddings of the answer entities may not always follow a uni-modal distribution in the embedding space. Thus, it is challenging to simultaneously retrieve a set of diverse answers from the embedding space using a single and concentrated query representation such as a vector or a hyper-rectangle. To better cope with queries with diversified answers, we propose Query2Particles (Q2P), a complex KG query answering method. Q2P encodes each query into multiple vectors, named particle embeddings. By doing so, the candidate answers can be retrieved from different areas over the embedding space using the maximal similarities between the entity embeddings and any of the particle embeddings. Meanwhile, the corresponding neural logic operations are defined to support its reasoning over arbitrary first-order logic queries. The experiments show that Query2Particles achieves state-of-the-art performance on the complex query answering tasks on FB15k, FB15K-237, and NELL knowledge graphs.


Introduction
Reasoning over a factual knowledge graph (KG) is the process of deriving new knowledge or conclusions from the existing data in the knowledge graph (Chen et al., 2020). A recently developed sub-task of knowledge graph reasoning is complex query answering, which aims to answer complex queries over large knowledge graphs (Hamilton et al., 2018;    . Compared to KG completion tasks (Liu et al., 2016;West et al., 2014), complex query answering requires reasoning over multi-hop relations and logical operations. As shown in Figure 1, complex KG queries are defined in predicate logic forms with relation projection operations, existential quantifiers ∃, logical conjunctions ∧, disjunctions ∨, and negation ¬. Answering these queries is challenging because real-world knowledge graphs (KG), such as Freebase (Bollacker et al., 2008), NELL (Carlson et al., 2010), and DBPedia (Bizer et al., 2009), are incomplete. Consequently, sub-graph matching methods cannot be used to find the answers.
To address the challenge raised from the incompleteness of knowledge graphs, the query embedding methods are proposed (Hamilton et al., 2018;Sun et al., 2020). In this line of research, the queries and entities are jointly encoded into the same embedding space, and the answers are retrieved based on similarities between the query embedding and entity embeddings. In general, there are two steps in encoding a query to the vector space. First, a query is parsed into a computational graph with a directed acyclic graph (DAG) structure, as shown in Figure 2 (A). Then, the query representation is iteratively computed following the neural logic operations and relation projections in the DAG.
Although the query embedding methods are robust for dealing with the incompleteness of KGs, The computational graph corresponds to the query "where did the non-Canadian Turing award laureates graduate from." (B) The Query2Particles encodes each query into a set of vectors, called particle embeddings. The logical operations iteratively compute particle embeddings following the computational graph. The answers are determined by using the maximum similarities between the entity embeddings and any one of the resulting particle embeddings. the embedding structure used for encoding the queries can be improved. Because of the multi-hop and compositional nature of complex KG queries, a single query may contain multiple sufficiently diverse answers. Thus, the ideal query embedding may follow a multi-modal distribution 1 in the embedding space. For example, the answers to the query, "Find entities, who are not American, were the Nobel Prize winners and eventually moved to the US," involve intermediate entities with different attributes, such as gender, nationality, research fields, etc. It is difficult to use a single embedding vector to find all final answer embeddings. Box embedding  partially solved this problem, but for complicated attributes, a single box may be too coarse, and intermediate entities are distributed far away from each other, so they are more like several disjoint clusters rather than a single big region in the embedding space. So for the query embedding methods, the capability to simultaneously encode a set of answers from different areas is necessary.
To better address the diversity of answers, we propose Query2Particles, a new query embedding method for complex query answering. In this approach, each query is encoded into a set of vectors in the embedding space, called particle embeddings. The particle embeddings of a query are iteratively computed by following the computational graph parsed from the query. Then the answers to this query are determined by using the maximum similarities between the entity embeddings and any one of the resulting particle embeddings. Experimental results show that Query2Particle achieves state-ofthe-art performance on complex query answering over three standard knowledge graphs: FB15K, FB15k-237, and NELL. Meanwhile, the inference 1 A multi-modal distribution is a distribution with two or more distinct peaks in the probability density function.
Figure 3: In the example embedding space, the yellow dots are the answer entities, and the blue dots are the non-answer entities. The purple areas in (B), (C), and (D) demonstrate the neighborhoods of the vector embedding, the box embedding, and the particle embeddings respectively. In this case, the particle embeddings are more suitable for finding the answers clustered in different areas in the embedding space.
speed of Query2Particles is comparable to other query embedding methods and is higher than query decomposition methods on multi-hop queries. Further analysis indicates that the optimal numbers of particles for different query types depend on the structures of the queries. Our experimental code is released on github 2 .

Related Work
Other query embedding approaches are closely related to our work. These query embedding methods leverage different structures to encode logical KG queries, and they can answer various scopes of logical queries. The GQE method proposed by Hamilton et al. (2018) can answer the conjunctive queries by representing queries as vector representations.  used hyper-rectangles to encode and answer existential positive first-order (EPFO) queries. At the same time, Sun et al. (2020) proposed to improve the faithfulness of the query embedding method by using centroid-sketch representations on EPFO queries. The conjunctive queries and EPFO queries are both subsets of firstorder logic (FOL) queries. The Beta Embedding  is the first query embed-ding method that supports a full set of operations in FOL by encoding entities and queries into probabilistic Beta distributions. In a contemporaneous work, Zhang et al. (2021) uses cone embeddings to encode the FOL queries. As shown in Figure 3, compared to these query embedding approaches, the Q2P method can encode the FOL queries to address the diversity of answers. Note that,  proposed to use the disjunctive normal form (DNF) to address the answer diversities resulting from the union operations. This partly solve the problem, but the diversity of the answers is not solely caused by the union operation, but a joint effort of multi-hop projections, intersection, and complement. As a result, using particle embeddings is a more general solution. Query decomposition (Arakelyan et al., 2020) is another approach to answering complex knowledge graph queries. In this line of research, a complex query is decomposed into atomic queries, and the probabilities of atomic queries are modeled by link predictors. In the inference process, continuous optimization and beam search are used for finding the answers. Meanwhile, the rule and path-based methods (Guo et al., 2016;Xiong et al., 2017;Lin et al., 2018;Guo et al., 2018;Chen et al., 2019) use pre-defined or learned rules to do multi-hop KG reasoning. These methods explicitly model the intermediate entities in the query. Instead, the query embedding methods directly embed the complex query and retrieve the answers without explicit modeling intermediate entities. So the query embedding methods are more scalable to large knowledge graphs and complex query structures.
Neural link predictors (Wang et al., 2014;Trouillon et al., 2016;Dettmers et al., 2018;Sun et al., 2018) are also related to this work. The link predictors learn the distributed representations of entities and relations in embedding space and use different neural structures to classify whether there exists a certain relation between two entities. The link predictors can be used for one-hop queries, but cannot be directly used for answering complex queries.

Preliminaries
In this section, we formally define the complex logical knowledge graph queries and the corresponding computational graphs. The knowledge graph reasoning is conducted on a multi-relational knowledge graph G = (V, R), where each vertex v ∈ V represents an entity, and each relation r ∈ R is a binary function defined as r : V × V → {0, 1}. For any r ∈ R, and u, v ∈ V, there is a relation r between entities u and v if and only if r(u, v) = 1.

First-Order Logic Query
The complex knowledge graph query is defined in first-order logic form with logical operators such as existential quantifiers ∃, conjunctions ∧, disjunctions ∨, and negations ¬. In a first-order logic query, there is a set of anchor entities V a ∈ V, existential quantified variables V 1 , V 2 , ...V k ∈ V, and a unique target variable V ? ∈ V. The query intends to find the answers V ? ∈ V, such that there simultaneously exist V 1 , V 2 , ...V k ∈ V satisfying the logical expression in the query. For each FOL query, it can be converted to a disjunctive normal form, where the query is expressed as a disjunction of several conjunctive expressions: Each c i represents a conjunctive expression of several literals e ij , and each e ij is an atomic or the negation of an atomic expression expressed by any of the following expressions: Here v a ∈ V a is one of the anchor entities, and V,

Computational Graph and Operations
As shown in Figure 2 (A), for a first-order query, there is a corresponding computational graph. In the computational graph, each node corresponds to an intermediate query embedding, and each edge corresponds to a neural logic operation to be defined in the following section. Both the input and output of these operations are query embeddings. These operations are used for implicitly modeling different set operations over the intermediate answer sets. These set operations include relational projection, intersection, union, and complement: (1) Relational Projection: Given a set of entities A and a relation r ∈ R, the relational projection will return all entities having relation r with at least one of entity e ∈ A. Namely, (2) Intersection: Given sets of entities A 1 , ...A n ⊂ V, this operation computes their intersection ∩ n i=1 A i ; (3) Union: Given several sets of entities A 1 , ...A n ⊂ V, the union operation calculates their union ∪ n i=1 A i ; (4) Complement: 2705 Given a set of entities A, the complement operation calculates its absolute complement V − A.

Query2Particles
In this section, we first introduce the particle embeddings structure and the neural logic operations, and then we present the learning of the model.

Particles Representations of Queries
In Query2Particles, each query is represented as a set of vectors, called particles. For simplicity, a set of particles {p (k) } K k=1 are represented as a matrix P . All the operations discussed in the following sections are invariant to the permutations of the particle vectors in the matrix. Formally, the particle embeddings P ∈ R d×K are where each vector p (k) ∈ R d is a particle vector. As shown in Figure 2, the computations along the computation graph start with the anchor entities, such as "Turing Award". Suppose the entity embedding of an anchor entity v is denoted as e v ∈ R d . Then, the initial particle embeddings are computed as the sum of e v and a learnable offset matrix M ∈ R d×K , Here and in the following sections, the addition between the matrix M and the vector e v is defined as the broadcasted element-wise addition.

Logical Operations
In this sub-section, we define and parameterize four types of neural logic operations: projection, intersection, negation, and union.

Projection
Suppose the e l ∈ R d is the embedding vector of the relation l. The relation projection f P is expressed as P i+1 = f P (P i , e l ), where the P i and P i+1 are input and output particle embeddings. Instead of directly adding the same relation embedding e l to all particles in P i to model the relation projection following (Bordes et al., 2013), we incorporate multiple neuralized gates (Chung et al., 2014) to individually adjust the relation transition for each particle in P i , which are expressed as follows: Here, σ and ϕ are the sigmoid and hyperbolic tangent functions, and ⊙ is the Hadamard product. Also, W P z , W P r , W P h , U z , U r , U h are parameter matrices. T is interpreted as the relation transitions for each of the particles given the relation embedding e l , and Z and R are the update gate and the reset gate used for customizing the relation transitions for each particle. Meanwhile, the relation projection result for each particle should also depend on the positions of other input particles. To allow information exchange among different particles, a scaled dot-product self-attention (Vaswani et al., 2017) module is also incorporated, The W P q , W P k , W P v ∈ R d×d are parameters used for modeling the input Query, Key, and Value for the self-attention module Attn. The Attn represents the scaled dot-product self-attention, Here, the Q, K, and V represent the input Query, Key, and Value for this attention layer.

Intersection
The intersection operation f I is defined on multiple sets of particle embeddings {P (n) i } N n=1 . It outputs a single set of particle embeddings i , ..., P (N ) i ] ∈ R d×N K , and this matrix P i serves as the input of the intersection operation. The operation updates the position of each particle according to the positions of other input particles in {P . This process is modeled by the scaled dot-product self-attention followed by a multi-layer perceptron (MLP) layer, Here W I q , W I k , W I v ∈ R d×d are parameters for the self-attention layer. The MLP here denotes a multilayer proceptron layer with ReLU activation, and the parameters in the MLP layers in different operations are not shared. To keep the number of particles unchanged, we uniformly sub-sample K particles out of the N K particles in P i+1 as the final output of the intersection operation. The P i+1 are directly taken from the all input particles in P i without any additional parameterization.

Complement
The input of the complement operation is a single set of particle embeddings P i , and the operation f C is formulated as P i+1 = f C (P i ). The complement operation updates the position of each particle based on the distributions of other input particles. The operation is then modeled by scaled dot-product attention followed by an MLP layer, and this can be formulated by Here, the P i+1 ∈ R d×K are the resulting particle embeddings for the complement operation, and the values in W C q , W C k , W C v ∈ R d×d are parameters. Intuitively speaking, the proposed structure can model the complement operation by encouraging the particles to move towards the areas that are not occupied by any of the input particles.

Union
The union operation is directly modeled by all the input particles without extra parameterization. In detail, the particles from the input particle embeddings are directly merged into a new set of particles,

Scoring
After the particle embeddings P T ∈ R d×K for the target variable of the query q are computed, the scoring function ϕ between the particle embeddings P T and each entity embedding e v is used for calculating the maximal similarities between each particle vectors in {p (k) T } K k=1 and entity embedding vector. Here, the inner product is used to compute the similarity scores between vectors, and the overall scoring function is expressed by

Learning Query2Particles
To train the Query2Particles model, we compute the normalized probability of the entity v being the correct answer of query q by using the softmax function on all similarity scores, .
Then we construct the cross-entropy loss from the given probabilities to maximize the log probabilities of all correct query-answer pairs: The (v (i) , q (i) ) denotes is one of the positive queryanswer pairs, and in total there are N such pairs.

Experiments
The experiments in this section demonstrate the effectiveness and efficiency of Query2Particles.

Experimental Setup
The Query2Particles method is evaluated on three commonly used knowledge graphs, FB15K (Bordes et al., 2013), FB15K-237 (Toutanova and Chen, 2015), and NELL995 (Carlson et al., 2010) with the standard training, validation, and testing edges separations. For each of these graphs, the corresponding training graph G train , validation graph G valid , and testing graph G test are created from training edges, training + validation edges, and training + validation + testing edges respectively. There are two sets of complex logical queries sampled from these knowledge graphs, and the existing methods evaluate their performance on either of them. Specifically,  sample nine different types of existential positive firstorder (EPFO) queries. For these queries, five types of them (1p, 2p, 3p, 2i, 3i) are used for training and evaluation in a supervised setting. For the rest of four types of queries (2u, up, ip, pi), they    do not appear in the training set and are directly evaluated in a zero-shot way. In another work,  refine these queries by raising the difficulties of the existing nine types of queries. They also include five types of complement queries (2in, 3in, inp, pni, pin) for general first-order logic (FOL) queries. These complement queries are also trained and evaluated in the supervised setting, but their training samples are fewer than other types. More details about the knowledge graphs and sampled queries are shown in the appendix. To demonstrate the performance of Query2Particles, it is evaluated on both sets of queries. Note that, the query-answer pairs used for training are only from the training graph G train . For validation and testing, only the hard answers from validation graph G valid and testing graph G test are evaluated.

Baselines
The Query2Particles model is compared with the following baselines in the following sections. Graph Query Embedding (GQE) answers conjunctive logic queries by encoding the logical queries into vectors (Hamilton et al., 2018).
Query2Box (Q2B) answers existential positive first-order logic queries by encoding them into boxes in the embedding space .
Beta Embedding (BetaE) answers first-order logic queries by modeling them as Beta Distributions . This is the current state-of-the-art model on first-order logic queries.
The reported mean reciprocal rank (MRR) scores of these baselines are used by the BetaE paper , and Query2Particles (Q2P) is evaluated following with the same metrics under the filtered setting, in which the rankings of answers are computed excluding all other correct answers. Meanwhile, the Q2P method is also compared with other methods on EPFO queries with the queries used by .
Continuous Query Decomposition (CQD) decomposes the complex queries to multiple atomic queries that can be solved by link predictors (Arakelyan et al., 2020) .
Embedding Query Language (EmQL) improves the faithfulness in the reasoning process by encoding EPFO queries into centroid-sketch representations (Sun et al., 2020).
The reported Hit@3 results of these two baselines are used by Arakelyan et al. (2020); Sun et al. (2020). Our model is evaluated on FB15K, FB15K-237, and NELL in the same setting.

Implementation Details
The Query2Particles model is trained on the queries in an end-to-end manner. To fairly compare with previous methods, we set the same size of embedding vectors as four hundred. We use the validation queries to tune hyperparameters for our model by using grid search. In the grid search, we consider the batch size from {1024, 2048, 4096

Comparison with Baselines
First, we compare Query2Particles (Q2P) with GQE, Q2B, and BetaE on the first-order logic queries used by . The results on all fourteen types of queries are reported in Table 1 and Table 2. To fairly compare with the baseline methods, we keep the same number of parameters used in each type of query embedding. As shown in Tables 1 and 2, the Q2P model can achieve more accurate results than GQE, Q2B, and BetaE on all types of queries except 2u. As we keep the number of query embedding parameters the same, it indicates that the structure of particle embeddings is more suitable for encoding complex queries than boxes or Beta distributions.
Though it is slightly less accurate on the 2u queries, Q2P is more efficient in encoding the queries that include union operations. This is because Q2P is the first embedding method that directly models the union operation. To avoid direct modeling of the union operation, all previous embedding methods pre-process the queries by con-verting them to DNF forms. However, the DNF forms can be exponentially larger than the original queries, and the conversion also takes exponential time. Meanwhile, BetaE proposes to use De Morgan's law to replace one union operation with one intersection and three complements, but this substitution still largely increases the query complexity. Instead, Q2P directly models the union operation without any pre-processing or additional parameterization, while achieving the state-of-theart performance on up, which is more complicated and involving the Union operation.
We also compare our model with EmQL and CQD methods on the queries used by . On average, our model has better Hit@3 scores on all datasets 3 . Compared to the CQD method, the Q2P method is better at answering multi-hop queries. encodes the complex queries into centroid-sketch representations, which cannot compactly encode sufficiently diverse answers. The Q2P method specifically addresses the diversity of answers, so it has higher empirical performance. CQD performs better on shorter queries like 1p, 2p, and 2u, because it can use the stateof-the-art link predictors. Also, as shown in Figure  6, the Q2P method demonstrates a faster inference speed than the CQD method on multi-hop queries, because CQD uses inference time optimization, which is either a continuous optimization or a beam search. The inference time optimization simplifies the learning of CQD but also slows down the inference efficiency on large graphs.

The Improvement of Q2P-KP
Experiments show that the performance of the diversified queries can be largely improved by using more particles. To demonstrate the effects, we conduct additional evaluations on the most diversified 10% queries for each query type, as shown in the DIVR columns in Table 4. In doing so, we use the number of answers to measure the diversity of each query. In the same table, we also present the original results in the FULL columns as a comparison. We can observe that there is a significant performance gap between the FULL and DIVR results, which demonstrates that the diversified queries are harder to answer. Meanwhile, it is also observed that comparing to Q2P-1P, Q2P-KP (K>1) significantly improves the MRR of DIVR queries by 7.8 points. From this perspective, the improvement of Q2P-KP (K>1) over Q2P-1P is significant on those challenging queries.

Further Ablation Study for Q2P-1P
To better explain the superior performance of Q2P-1P over the baseline models, we conduct further ablations studies in Table 5.
First, we remove all the self-attention layers Attn. Then the performance of intersection operations largely decreased. This can be explained that the self-attention structure is important for aggregating the information from multiple sub-queries.
Then, we remove all the neural network structures, including all MLP and Attn from all operations, and replace them with the operations defined in the GQE model (Hamilton et al., 2018)   the performance of Q2P is also reduced. This indicates that the neural structures in the particle operations are also important to the overall improvement. Thus, we infer that the baseline model underfit the complex queries in the training set, and the performance can be improved by introducing more parameters and non-linearity. This conclusion is also aligned with Sun et al. (2020), in which they found the baselines cannot faithfully answer the queries that are observed in the training time. However, solely using more complex structures cannot address the problem raised from the diversity of the answers. As shown in Table 4, on the top of Q2P-1P, Q2P-KP (K>1) can still largely improve the performance on the diversified queries.

Conclusion
In this paper, we proposed Query2Particles, a query embedding method for answering complex logical knowledge graph queries over incomplete knowledge graphs. The Query2Particle method supports a full set of FOL operations. Specifically, the Q2P method is the first query embedding method that can directly model the union operation without any preprocessing. Experimental results show that the Q2P method achieves state-of-the-art performances on answering FOL queries on three different knowledge graphs while using comparable inference time as the previous methods.

Ethical Impacts
This paper introduces a knowledge graph reasoning method, and the experiments are on several publicly available benchmark datasets. As a result, there is no data privacy concern. Meanwhile, this paper does not involve human annotations, and there is no related ethical concerns.  Table 6: The basic information about the three knowledge graph used for the experiments, and their standard training, validation, and testing edges separation according to      paper, while the lower part describes the queries taken from . The major differences are that the queries in  is harder than , and include five additional types of queries with the complement operation.