Wasserstein-Fisher-Rao Embedding: Logical Query Embeddings with Local Comparison and Global Transport

Answering complex queries on knowledge graphs is important but particularly challenging because of the data incompleteness. Query embedding methods address this issue by learning-based models and simulating logical reasoning with set operators. Previous works focus on specific forms of embeddings, but scoring functions between embeddings are underexplored. In contrast to existing scoring functions motivated by local comparison or global transport, this work investigates the local and global trade-off with unbalanced optimal transport theory. Specifically, we embed sets as bounded measures in $\real$ endowed with a scoring function motivated by the Wasserstein-Fisher-Rao metric. Such a design also facilitates closed-form set operators in the embedding space. Moreover, we introduce a convolution-based algorithm for linear time computation and a block-diagonal kernel to enforce the trade-off. Results show that WFRE can outperform existing query embedding methods on standard datasets, evaluation sets with combinatorially complex queries, and hierarchical knowledge graphs. Ablation study shows that finding a better local and global trade-off is essential for performance improvement.

Formally speaking, complex logic queries can be expressed via first-order logic (Ren et al., 2020;Marker, 2002).Specific groups of queries, whose predicates and logical connectives can be converted as set operatiors (Wang et al., 2021), are of particular interest due to their clear semantics.Therefore, the logical reasoning process to answering complex queries is transformed to execute set projections and operations in an operator tree (Ren et al., 2020;Wang et al., 2021).Figure 1 shows the operator tree for the query "Who is the non-American director that has won Golden Globes or Oscar".
What makes this task difficult is the data incompleteness of knowledge graphs.Modern large-scale KGs are naturally incomplete because they are constructed by crowdsource (Bollacker et al., 2008;Vrandečić and Krötzsch, 2014) or automatic information extraction pipelines (Carlson et al., 2010).This issue is acknowledged as the Open World Assumption (Libkin and Sirangelo, 2009) (OWA).It leads to the fact that applying query answering algorithms for complete databases will not result in complete answers because of the data incompleteness.Also, it is not able to prune the search space with the observed incomplete knowledge graph, which results in a large computational cost (Ren et al., 2020).It makes the problem even harder when answering logical queries on large knowledge graphs with billions of edges (Ren et al., 2022).We refer readers to recent surveys for more about logical queries on knowledge graphs (Wang et al., 2022b;Ren et al., 2023).

Set Operators
Figure 1: Answering logical queries on knowledge graphs.Natural language sentences can be interpreted as logical formulas and then converted to set operator trees (Wang et al., 2021).et al., 2022;Yang et al., 2022).However, the scoring function between sets, though it also characterizes set embeddings and plays a vital role in training models, is underexplored in the existing literature.Existing scoring functions are chosen from two categories that emphasize either local comparisons (Ren and Leskovec, 2020;Amayuelas et al., 2022) or global transport between geometric regions (Ren et al., 2020;Choudhary et al., 2021b;Zhang et al., 2021).The following example motivated us to develop scoring functions for embeddings with both local and global trade-off.
Example 1.1.Consider four "one-hot" vectors with dimension d = 100: A = [1, 0, 0, ..., 0], B = [0, 1, 0, ..., 0], C = [0, 0, 1, 0, ..., 0], (3) We observe that:  In this paper, we develop a more effective scoring function motivated by the Wasserstein-Fisher-Rao (WFR) metric (Chizat et al., 2018a), which introduces the local and global trade-off.We propose to embed sets as Bounded Measures in R, where each set embedding can be discretized as a bounded histogram on uniform grids of size d.This set embedding can be interpreted locally so that the set intersection, union, and negation can be easily defined by element-wise fuzzy logic t-norms (Hájek, 1998).We propose an efficient convolutionbased algorithm to realize the computation of entropic WFR in O(d) time, and a block diagonal kernel to enforce the local and global trade-off.We conduct extensive experiments on large number of datasets: (1) standard complex query answering datasets over three KGs (Ren and Leskovec, 2020), (2) large-scale evaluation set emphasizing the combinatorial generalizability of models in terms of compositional complex queries (Wang et al., 2021), and (3) complex queries on a hierarchical knowledge graph (Huang et al., 2022).Ablation studies show that the performance of complex query answering can be significantly improved by choosing a better trade-off between local comparison and global transport.

Related Works
We discuss other query embedding methods in fixed dimensions and optimal transport in this section.Other methods for complex query answering are discussed in Appendix A,
In this work, we establish a novel scoring function motivated by unbalanced optimal transport theory (Chizat et al., 2018a).As a variant of the optimal transport, it inherits the advantages and bal-ances the local comparison and global transport.
Wasserstein-Fisher-Rao (WFR) metric (Chizat et al., 2018a) generalizes the OT between distributions to the measures by balancing local comparison and global transport with a transport radius η.Existing investigations (Zhao et al., 2020b) demonstrated that the WFR metric is a robust and effective measurement for embedding alignment.Previous work measures pretrained embeddings in the WFR space (Wang et al., 2020), while this work is the first to learn embeddings in the WFR space.Moreover, we validate the advantage of WFR space in the context of query embedding.

Knowledge Graph and Complex Queries
A knowledge grpah KG = {(h, r, t) ∈ V ×R×V} is a collections of triples where h, t ∈ V are entity nodes and r ∈ R is the relation.
Complex queries over knowledge graphs can be defined by first-order formulas.Following previous works (Ren and Leskovec, 2020), we consider a query Q with one free variable node V ? and quantified nodes V i , 1 ≤ i ≤ n, an arbitrary logical formula can be converted to prenex and DNF forms as follows (Marker, 2002).
where each quantifier □ is either ∃ or ∀, each c i , 1 ≤ i ≤ l is a conjunctive clause such that c i = y i1 ∧ • • • ∧ y im i , and each y ij , 1 ≤ j ≤ m i represents an atomic formula or its negation.That is, y ij = r(a, b) or ¬r(a, b), where r ∈ R, a and b can be either a variable V • or an entity in V.

Answer Queries with Set Operator Trees
Queries that can be answered by set operators are of particular interest (Ren and Leskovec, 2020).The answers can be derived by executing set operators in bottom-up order.The leaves of each operator tree are known entities, which are regarded as sets with a single element.The input and the output of each set operator are all sets.We note that queries solvable by set operators are only a fragment of the first-order queries due to their additional assumptions that guarantee their conversion to operation trees (Wang et al., 2021).Moreover, the choice of set operators is not unique to representing the entire class.In this work, we focus on the following operators: Set Projections Derived from the relations.Set Operations : Set Intersection Derived from conjunction.
Set Union Derived from disjunction.Set Complement Derived from negation.
Then the WFR metric is defined by solving the following minimization problem.

WFR(µ, ν; η) = min
where P ∈ R M ×N is the transport plan and P ij indicates the mass transported from x i to y j .We denote the global minima P * of the Problem (6) as the WFR optimal transport plan.The objective function reads, , where 1 N is the column vector in R N of all one elements, and where η is the hyperparameter for the transport radius.
One of the key properties of the WFR metric could be understood by the geodesics in WFR space, as stated in Theorem 4.1 by Chizat et al. (2018a).Specifically, for two mass points at positions x and y, the transport only applies when ∥x − y∥ < η, such as place 1 and 2 in Figure 2, otherwise, only local comparison is counted.We see that the η controls the scope of the transport process.

Entropic Regularized WFR Solution
The WFR metric in Equation ( 6) can be computed by the Sinkhorn algorithm with an additional entropic regularization term (Chizat et al., 2018b).Specifically, one could estimate WFR with the following entropic regularized optimization problem, The generalized Sinkhorn algorithm (Chizat et al., 2018b) solves the unconstraint dual problem of Problem (9), which maximizes the objective where K ϵ = e − C ϵ is the kernal matrix, ϕ ∈ R M and ψ ∈ R N are dual variables.The update procedure of the (l + 1)-th step of the j-th Sinkhorn iteration is Let ϕ * and ψ * be the optimal dual variables obtained from a converged Sinkhorn algorithm.The optimal transport plan is recovered by We could see that the Sinkhorn algorithm employs the matrix-vector multiplication that costs O(M N ) time.In contrast to the Wasserstein metric that can be approximated by 1D sliced-Wasserstein (Carriere et al., 2017;Kolouri et al., 2019) under O((M + N ) log(M + N )) time, there is no known sub-quadratic time algorithm for even approximated WFR metric, which hinders its largescale application.In the next section, we restrict set embeddings to bounded measures in R. We further develop an O(d) algorithm by leveraging the sparse structure of kernel matrix K ϵ .

Wasserstein-Fisher-Rao Embedding
The goal of this section is to present how to solve complex queries with set embeddings as the Bounded Measure in R. Let the S be an arbitrary set, including the singleton set {e} with a single entity e, its embedding is m[S].We denote the collection for all bounded measures as BM(R).Our discussion begins with the discretization of measure m[S] ∈ BM(R) to histogram m S ∈ BM d , where BM d is the collection of bounded histograms with d bars.Then we discuss how to parameterize set operators with embeddings in the BM d and efficiently compute the scoring function in BM d .Finally, we introduce how to learn set embeddings and operators.

Discretize BM1Ds into Histograms
We discretize each m[S] ∈ BM(R) as a histogram on a uniform mesh on R. Without loss of generality, the maximum length of bars in the histogram is one, and the mesh spacing is ∆.In this way, each m Therefore, set operations intersection ∩, union ∪, and complement on the BM d are modeled by the element-wise t-norm on the mass vector m S .For the i-th element of the mass vector, Complement where ⊤ is a t-norm and ⊥ is the corresponding t-conorm.
Neural Set Projections Each set projection is modeled as functions from one mass vector to another given a relation r.We adopt base decomposition (Schlichtkrull et al., 2018) to define a Multi-Layer Perceptron (MLP) from where σ is an activation function, and r and b (l) r are the weight matrix and bias vector for relation r at the l-th layer.Specifically, K is the number of bases, r ∈ R K is the relation embedding.
j ∈ R d l+1 the are the base weight matrices and base bias vectors at the l-th layer, respectively.
Dropout on Set Complement Inspired by the dropout for neural networks that improves the generalizability, we propose to apply dropout to the set complement operation.The idea is to randomly alter the elements in mass vectors before the complement operation by randomly setting their values to 1 2 .In this way, the complemented elements are also 1 2 .This technique improves the generalizability of the set complement operator.

Scoring function for BM d
Consider m S 1 , m S 2 ∈ BM d .It is straight forward to score this pair by WF R(m S 1 , m S 2 ; η).However, direct applying the Sinkhorn algorithm requires a O(d 2 ) time, which hinders the largescale computation of the WFR metric.In this part, we introduce (1) convolution-based Sinkhorn to reduce the complexity within O(d) time and (2) block diagonal transport as an additional mechanism for the local and global tradeoff besides the transport radius η.We note that our contribution does not coincide with the recent linear-time "fast" Sinkhorn algorithms (Liao et al., 2022a,b), which do not apply to unbalanced optimal transport in BM d .

Convolution-based Sinkhorn
The computational bottleneck for the Sinkhorn update shown in Equation ( 11) and ( 12) is the matrix-vector multiplication.When comparing the discretized measures in BM d , K ϵ exhibits a symmetric and diagonal structure.
Let ω = ⌊ η ∆ ⌋ be the window size, the matrix-vector multiplication where Then the Sinkhorn algorithm could be simplified as Hence, the time complexity of the Sinkhorn algorithm could be reduced to O(ωd).In our setting, ω is the window size that interpolates the global transport and local comparison, and β is chosen to be 1 in every setting.
Once the convolution-based Sinkhorn algorithm converged, we could approximate the WFR metric via the D ϵ with optimal ϕ * and ψ * .For complex query-answering, the final answers are ranked by their distances (the smaller, the better).This process could be accelerated by the primal-dual pruning for WFR-based k-nearest neighbors (Wang et al., 2020) or the Wasserstein Dictionary Learning (Schmitz et al., 2018).Block Diagonal Transport Besides the window size ω that controls the scope of transport relative to each mass point, we provide another mechanism to restrict the scope of the transport by the absolute position of each mass point.Specifically, we consider the block diagonal kernel matrix K b ϵ of b blocks, and a = d/b is the size of each diagonal block.We could see from Equation (13) that the block diagonal kernel leads to the block diagonal transport plan.Figure 3 illustrates the differences between the two mechanisms for restricting global transport in terms of possible transport plans.
Computing the Scoring Function We propose to define the scoring function Dist computed by a convolution-based Sinkhorn with a block diagonal kernel.It should be stressed that a Problem (9) of size d × d could be regarded as solving b independent problems of size a × a under the block diagonal problem.This behavior encourages a greater parallelization of the Sinkhorn iterations ( 21) and ( 22).We assume a > ω to ensure each block contains at least a window size of WFR transport so that those two mechanisms could work together.Given the parallel nature of 1D convolution, the entire distance can be highly parallelized with GPU.Specifically, the scoring function Dist is given in Algorithm 1. M 1 ← m S 1 .reshape(1,b, a).

9:
end for 10: Ren and Leskovec (2020) to train the parameterized projections and embeddings with negative sampling.For a query Q, we sample one answer a and K neg negative samples {v k } Kneg k=1 .The objective function is where γ is the margin, and ρ is the scale, and σ is the sigmoid function.

Experiments
In this section, we evaluate the performance of WFRE on complex query answering in three aspects: (1) we compare WFRE with other SOTA query embedding methods over commonly used datasets on three knowledge graphs (Ren and Leskovec, 2020); (2) we evaluate WFRE on 301 query types to justify its combinatorial generalizability (Wang et al., 2021); (3) we train and evaluate WFRE on a complex query answering datasets on WordNet (Miller, 1995), a lexical KG whose relations are typically hierarchical (Huang et al., 2022).Aspects (2) and (3) emphasize on different query types and the underlying KG, respectively.These results provide empirical evidence for WFRE's strong capability for applying to various query types and KGs.Moreover, we also investi-gate the local and global tradeoff of WFRE on ω and a in the ablation study.Other results are listed in the Appendix.

Experimental Settings
For all experiments, we follow the practice of training and evaluation in Ren and Leskovec (2020).We train query embeddings on train data, select hyperparameters on valid data, and report the scores on test data.Details about the training and evaluation protocol are described in Appendix B. For WFRE, the hyperparameters are listed and discussed in Appendix C. All experiments are conducted on one V100 GPU of 32G memory with PyTorch (Paszke et al., 2019).

Benchmark Datasets
Datasets on FB15k-237 (Bordes et al., 2013a), FB15k (Toutanova and Chen, 2015), and NELL (Xiong et al., 2017b) proposed by (Ren and Leskovec, 2020) 1 shows how WFRE outperforms existing methods by a large margin in terms of the scores averaged from queries with and without logic negation.

Combinatorial Generalization on Queries
We also explore how WFRE generalizes on the combinatorial space of complex queries on a benchmark targeting the combinatorial generalizability of query embedding methods (Wang et al., 2021).Details of datasets are presented in Appendix D.2 Results of 301 different query types are averaged by the number of anchor nodes and the maximum depth of the operator tree and are visualized in Figure 4. To illustrate the combinatorial generalizability of complex queries, we normalize scores on query types with the scores on BetaE, as indicated in the axis labels in Figure 4. Then we plot the results into lines by the number of anchor nodes   (Wang et al., 2021).Results of BetaE and LogicE are taken from Wang et al. (2021).The slopes of lines indicate how the performance of a complex query grows as the performance of the one-hop query grows.and the max depths.Scores from the same model are located at the same vertical line.We find that WFRE not only improves the performance significantly but also generalizes better in combinatorial complex queries with a larger slope compared to LogicE (Luus et al., 2021).

Complex Queries on Hierarchical KG
Evaluations above are restricted to three commonly used knowledge graphs.Then, we turn to another type of the underlying knowledge graph, which is characterized by the hierarchy of its relation.We train and evaluate WFRE on a complex query dataset proposed by Huang et al. (2022) on Word-Net (Miller, 1995).Details of this dataset are shown in Appendix D.3.We compare WFRE to LinE (Huang et al., 2022), another histogram-based query embedding proposed to solve queries on hierarchical KG without global transport.Table 2 shows the results on WR18RR.We could see that WFRE significantly outperforms LinE and BetaE.In particular, WFRE significantly improved the performance of BetaE and LinE on longer multi-hop queries, i.e., 1P, 2P, and 3P queries.It should be stressed that LinE also used histograms as WFRE but trained with the scoring function motivated only by local comparison.This result shows that WFRE is suitable for modeling hierarchical relations because the local and global tradeoff on the scoring function learns better embeddings WFRE.It also confirms that Wasserstein spaces make the embeddings more efficient (Frogner et al., 2018).

Local and Global Trade-off
We further investigate how two mechanisms to restrict the transport, i.e., transport window size ω and block size a affect the performance.Experiments are conducted on queries on FB15k-237 sampled by Ren and Leskovec (2020).We alter one value and fix another one.The default choice is (ω, a) = (3, 5). Figure 5 demonstrates the effect of these two hyperparameters.Compared to the most recent SOTA query embedding GammaE (Yang et al., 2022), the result confirms the importance of the trade-off between local comparison and global transport.When the block size a = 5, we find that larger window size ω hurts the performance of negation.Meanwhile, the performance of queries without negation (EPFO queries) reaches their maximum when properly choosing ω = 3.When the window size is fixed ω = 3 and a is small, we see that the performance of EPFO and negation queries follows our observation for window size.Further increasing the block size a only has little impact on the EPFO queries but also hurts the performance of negation queries.It indicates that a proper a is necessary for performance when ω is fixed.This observation could help to improve the degree of parallelization of the convolution-based Sinkhorn algorithm.

Conclusion
In this paper, we propose WFRE, a new query embedding method for complex queries on knowledge graphs.The key feature of WFRE against to previous methods is its scoring function that balances local comparison and global transport.Empirical results show that WFRE is the state-of-the-art query embedding method for complex query answering, and has good generalizability to combinatorially complex queries and hierarchical knowledge graphs.The ablation study justifies the importance of the local and global trade-off.

Limitation
WFRE suffers common drawbacks from the existing query embedding methods.The queries that can be solved by such methods are a limited subclass of first-order queries.It is also not clear how to apply WFRE to unseen entities and relations in an inductive setting.

Ethics Statement
As a query embedding method, WFRE has stronger generalizability to different query types and knowledge graphs.Experiments and evaluations in this paper involve no ethical issues and are not even related to any human entities.WFRE could be potentially used to efficiently infer private information from an industrial-level knowledge graph.This is a common potential risk for approaches targeting data incompleteness and link prediction.

A Other Methods for Complex Query Answering
Despite of computing query embedding with neural set operators, other approaches are also proposed to derive answers.Daza and Cochez (2020); Liu et al. (2022) explored the graph representation to answer the logical queries with graph neural networks while Kotnis et al. (2021) discussed the logical queries as sequence representation.Arakelyan et al. (2021) solves the logical queries by solving the continuous optimization problems induced by neural link predictors.However, these discussions are only limited to EPFO queries without logical negation.It is not clear how these methods handle first-order queries.Meanwhile, neural symbolic methods estimate the probability for whether each entity is the answer set (Zhu et al., 2022;Xu et al., 2022) even at each intermediate step.Therefore, it requires O(|V| + |T |) space and time to derive answers for a given query, where V and T are the entity set and the triple set of a knowledge graph.Compared to the query embedding methods that require only O(d), where d is the fixed dimension of the embedding space, it is challenging to scale neural symbolic methods to logical queries on large-scale knowledge graphs (Ren et al., 2022).

B Training and Evaluation Protocal
We follow the commonly used experiment settings for EFO-1 query answering, which aims to find non-trivial answers in incomplete graphs and generalize to queries of unseen types.
Given an underlying KG G = (V, R) and its triple set T , we sample three subgraphs by change the scope of triples T train ⊂ T valid ⊂ T test = T .Following the standard evaluation protocol, we aim to find the non-trivial answers which cannot be directly discovered by traversing graphs.We denote [q] train as the answer set of query q in the train graph, the answer set we focus on is [q] test \[q] train , and these are easy answers that can only be reasoned or predicted.The hard answers are [q] test \ [q] valid .Then we would rank the easy(hard) answers against all the non-answer sets V/[q] valid (V/[q] test ).After getting the rank r, we calculated mean reciprocal rank (MRR): 1 r and Hits at K(Hits@K):1 r<K as metric to measure the performance of models.

C Settings for WFRE
Our framework is implemented with Pytorch.Our code is based on the pipeline for the EFO-1-QA benchmark (Wang et al., 2021) and we use AdamW as the optimizer.
There are also some hyperparameters in code.We apply dropout on projection network and denote the drop probability as Drop p .The Sinkhorn's algorithm's maximum iteration is denoted as K S .And We just set the layer of Projection MLP as 1 because of the results of the experiment relsults.The hyperparameters and their related information in WFRE are listed in Table 3.We finetune the hyperparameters for four datasets and the results are presented in Table 4. Hope the two tables could help you quickly understand our model's hyperparameters.

D Datasets and Baselines
In this section, we introduce the baselines in three experiments.Table 5 presents the basic statistics of different queries on all the benchmark datasets.

D.1 Benchmark datasets
For commonly used dataset (Ren and Leskovec, 2020), there are ten query types 1P, 2P, 3P, 2I, 3I, 2IN, 3IN, INP, PNI, PIN in the training dataset but also four unseen query structures IP, PI, 2U, and UP in the valid and test datasets.The related query structures are visualized in Figure 6.The purpose of unseen types of the vaild and test queries is to test the combinatorial generalizability of the neural set operator.
In this part, we choose the following complex query embedding methods which support arbitrary EFO1 queries: BetaE (Ren and Leskovec, 2020) Beta distribution embedding whose scoring function is   Wang et al. (2021) propose a new dataset including 301 different query types to benchmark the combinational generalizability of CQA models.Based on the EFO-1 queries represented by OpsTree, EFO-1 formulas are generated with operations including entity, projection, intersection, union, and negation.To make queries more realistic, the maximum length of projection/negation chains and the number of anchor nodes are both limited to no more than 3.The baselines are BetaE and LogicE (Luus et al., 2021).Scores are directly taken from Wang et al. (2021).

D.3 Complex queries on Hierarchical KG
WN18RR is first introduced as a link prediction dataset created from WN18 (Bordes et al., 2013b), which is a subset of WordNet.There are 93,003 triples with 40,943 entities and 11 relation types in WN18RR and most of the relations are hierarchical.In Table 6, we could know seven out of eleven relations have high antisymmetry Khs Gr Huang et al. (2022) and negative transitive score ξ Gr (Gu et al., 2018) and are regarded as hierarchical relations.Huang et al. (2022) extends complex logic queries to WN18RR and detaied queries stastics is in Table 5. Huang et al. (2022) generated 14 types of queries from hierarchical KG WN18RR and aimed to investigate the reasoning ability of query embeddings in hierarchical knowledge graphs.We choose BetaE and LinE (Huang et al., 2022) as baselines, their scores are also taken from Huang et al. (2022).Notably, LinE (Huang et al., 2022) is also a histogram-based query embedding method based on the same closed-form set operation.The key difference between LinE and WFRE is that WFRE encourages the local and global trade-offs.

F Addtional results
Moreover, we further compare with two QE methods FuzzQE (Chen et al., 2022) and Gam-maE (Yang et al., 2022).Yang et al. (2022) develop a new union operation method with the selfattention mechanism and get better performance than DNF and DM.FuzzQE's result on FB15k is missing, and the suggested hyperparameters setting on FB15k-237 is missing.As we couldn't reproduce FuzzQE's result on NELL, we list the results in the paper and those reproduced by us.In Table 7, WFRE outperforms the two models except for the FuzzQE result in the paper.
Table 8 also provides the mean and standard derivation of the output of our model.All scores are computed from four runs of cases of different random seeds.We could see that the standard derivation is four orders smaller than the mean value.It shows that WFRE is very stable and significantly outperforms previous baselines.D3.Did you discuss whether and how consent was obtained from people whose data you're using/curating?For example, if you collected data via crowdsourcing, did your instructions to crowdworkers explain how the data would be used?Not applicable.Left blank.
D4. Was the data collection protocol approved (or determined exempt) by an ethics review board?Not applicable.Left blank.
D5. Did you report the basic demographic and geographic characteristics of the annotator population that is the source of the data?Not applicable.Left blank.
99.However, G is risky for optimization.For example, if G(A, D) + G(A, B) appears in the objective function of a batch, G(A, D) will dominate G(A, B) because it is 100 times larger, making the optimization ineffective.• Local and global trade-off function (such as the WFR scoring function proposed in this paper) harnesses this risk by constraining the transport within a window size.Our paper finds that the proper window size is 5, which truncated the transport distances between faraway samples like A and D.Then, W F R(A, D) = 5, and the optimization is stabilized.

Figure 2 :
Figure 2: Illustration of different scoring functions.Left: global transport, where the difference is measured by how to move mass from one place to another (purple arrows); Right: local comparison, where the difference is measured by in-place comparison (yellow arrows); Mid: local and global trade-off, where we first move mass in the transport radius η, then compare the unfilled mass.
Therefore, it is sufficient to store the discretized mass vector m S = [m S 1 , . . ., m S d ] ∈ BM d because the support set {i∆} d i=1 is fixed for all m[S] ∈ BM(R).Then we discuss set operators on BM d 4.2 Set Operators on BM d Non-parametric Set Operations It should be stressed that the mass vector m S ∈ BM d can be interpreted locally, where each element of m S is regarded the continuous truth value in fuzzy logic.

Figure 3 :
Figure 3: Example of 16 × 16 transport plan matrices by two mechanisms.The zero elements are indicated by white blocks while the (possible) non-zero elements are colored.The transport scope of a sample mass point (green block) is illustrated by the arrows.Left: Relative scope by the WFR transport of window size ω = 3; Right: Absolute scope by the block diagonal kernel of block size a = 4.

Algorithm 1
Embeddings in BM d Let m Q ∈ BM d be a query embedding of query Q[V ?] and m e ∈ BM d be the set embedding for unitary set {e} with element e.We follow the prac-13684 Scoring function on BM d (PyTorchlike style) Require: two bounded measures m S 1 , m S 2 ∈ BM d , entropic regularization ϵ, window size ω, number of blocks b such that the block size a = d/b ≥ ω, number of iteration L. 1: procedure Dist(m S 1 , m S 2 , ϵ, ω, a, b, L) 2:

Figure 4 :
Figure4: Visualization of different query embedding methods on combinatorial generalizability benchmark(Wang et al., 2021).Results of BetaE and LogicE are taken fromWang et al. (2021).The slopes of lines indicate how the performance of a complex query grows as the performance of the one-hop query grows.

Figure 6 :
Figure 6: Visualization of logic query structures.The left queries just appear in the training phase, and all the queries are used in the validation and test phases.

Table 1 :
MRR scores for answering all tasks on FB15k, FB15k-237, and NELL.Scores of baselines are taken from their original paper.The boldface indicates the best scores.A P is the average score for queries without negation (EPFO queries).A N is the average score for queries with negation.

Table 2 :
Huang et al. (2022)rent query embedding methods on WN18RR.A p is the average of scores from 1P, 2P, and 3P queries; A ℓ is the average of scores from other queries without negation; A N is the average of scores from queries with negation.Scores are taken fromHuang et al. (2022).

Table 4 :
Best hyperparameters on every dataset

Table 5 :
Number of training, validation, and test queries generated for different query structures.

Table 6 :
Hierarchical relations in WN18RR

Table 8 :
WFRE: metrics' mean values (×10 −2 ) and standard deviations (×10 −6 , boldface).C2.Did you discuss the experimental setup, including hyperparameter search and best-found hyperparameter values?Left blank.C3.Did you report descriptive statistics about your results (e.g., error bars around results, summary statistics from sets of experiments), and is it transparent whether you are reporting the max, mean, etc. or just a single run?Left blank.C4.If you used existing packages (e.g., for preprocessing, for normalization, or for evaluation), did you report the implementation, model, and parameter settings used (e.g., NLTK, Spacy, ROUGE, etc.)?Left blank.D Did you use human annotators (e.g., crowdworkers) or research with human participants?D1.Did you report the full text of instructions given to participants, including e.g., screenshots, disclaimers of any risks to participants or annotators, etc.?Not applicable.Left blank.D2.Did you report information about how you recruited (e.g., crowdsourcing platform, students) and paid participants, and discuss if such payment is adequate given the participants' demographic (e.g., country of residence)?Not applicable.Left blank.