A Pretraining Numerical Reasoning Model for Ordinal Constrained Question Answering on Knowledge Base

Knowledge Base Question Answering (KBQA) is to answer natural language questions posed over knowledge bases (KBs). This paper targets at empowering the IR-based KBQA models with the ability of numerical reasoning for answering ordinal constrained questions. A major challenge is the lack of explicit annotations about numerical properties. To address this challenge, we propose a pretraining numerical reasoning model consisting of NumGNN and NumTransformer, guided by explicit self-supervision signals. The two modules are pretrained to encode the magnitude and ordinal properties of numbers respectively and can serve as model-agnostic plugins for any IR-based KBQA model to enhance its numerical reasoning ability. Extensive experiments on two KBQA benchmarks verify the effectiveness of our method to enhance the numerical reasoning ability for IR-based KBQA models. Our code and datasets are available online 1 .


Introduction
Knowledge Base Question Answering (KBQA) aims at finding answers from the existing knowledge bases (KBs) such as freebase (Bollacker et al., 2008) and DBPedia (Lehmann et al., 2015) for the given questions expressed in natural language.KBQA has emerged as an important research topic in the last few years (Sun et al., 2018(Sun et al., , 2019;;Lan and Jiang, 2020;He et al., 2021), as the logically organized entities and relations in KBs can explicitly facilitate the QA process.
Two mainstream methods including the semantic parsing based (SP-based) models (Berant et al., 2013;Bao et al., 2016;Liang et al., 2017;Lan and Jiang, 2020) and the information retrieval based (IR-based) models (Sun et al., 2018(Sun et al., , 2019;;Saxena et al., 2020;He et al., 2021) are commonly studied to solve KBQA task.The SP-based models heavily rely on the intermediate logic query parsed from the natural language question, which turns out to be the bottleneck of performance improvement (Lan et al., 2021).On the contrary, the IR-based models directly represent and rank the entities in a questionaware subgraph based on their relevance to the question.Such an end-to-end paradigm is easier to train and more fault-tolerant.However, most of the IR-based models focus on the single-or multihop relation tasks.To answer the example question "Which is the largest city in China?" in Figure 1, the answer "Beijing" is supposed to encode not only the magnitude of its area but also the ordinal relationship with "largest"-the ordinal determiner in the question.Existing IR-based models are not explicitly aware of the magnitude and ordinal properties of entities, making the entity representations fall short in the ability to support such numerical reasoning.
In view of the issue, this paper targets at empowering the IR-based KBQA models with the ability of numerical reasoning to address the ordinal constrained questions.Ordinal constraint is summarized as one of the most important constraints via web query analysis (Bao et al., 2016) and ordinal is also defined as the second fundamental measurement to capture data in the forms of surveys2 .
Some efforts have been made on numerical reasoning for machine reading comprehension (MRC) (Yu et al., 2018;Ran et al., 2019;Chen et al., 2020).For example, given a question and a passage from which the answer can be inferred, NumNet (Ran et al., 2019) is an end-to-end model to learn the number embeddings and the nonnumerical word embeddings together, which are encoded by graph neural network (GNN) (Kipf and Welling, 2017) and BERT (Devlin et al., 2019) respectively.QDGAT (Chen et al., 2020) further wires the numbers and the words in a same graph and encode them together by GNN.However, most of them implicitly infer number embeddings based on the QA pairs without the explicit annotation of the magnitude and ordinal relationships of numbers.Such weak supervision signals bring difficulties to infer accurate number embeddings, which becomes more prominent when the ordinal supervision signals are rarely available in existing KBQA datasets.In fact, the three well-known KBQA benchmarks, MetaQA (Zhang et al., 2018), We-bQuestionSP (WebQSP) (Yih et al., 2016) and ComplexWebQeustions (CWQ) (Talmor and Berant, 2018) only contain 0, 101 and 1821 ordinal constrained questions respectively.
To tackle the above challenge, we propose a pretraining method with additional self-supervision signals to capture two critical ingredients for ordinal constrained KBQA: • Relative Magnitude: The relative magnitude between numbers, such as "1 ≺ 2 ≺ 3", is to be preserved by number embeddings3 .
• Ordinal Relationship: Based on the above relative magnitude, the ordinal relationship between each number and the ordinal determiner ( such as "largest" in the question ) is to be captured, e.g., 3 in "1 ≺ 2 ≺ 3" is identified as the largest number.
Number embeddings which satisfy the above two ingredients are capable of numerical reasoning for ordinal constrained questions.To obtain such number embeddings, we propose two pretraining modules NumGNN and NumTransformer.The former one pretrains a GNN upon the constructed number graphs by a number-aware triplet loss function to preserve the relative magnitude, and the latter one pretrains a transformer upon the constructed question-aware number graphs by a number prediction loss function to capture ordinal relationships.Compared with the weak supervision signals from QA pairs, such self-supervision signals explicitly denote the numerical properties.
After pretraining, NumGNN and NumTransformer can be attached as model-agnostic plugins into any IR-based KBQA model to infer number embeddings.By fusing the number embeddings into the entity embeddings learned by the basic model, the numerical reasoning ability of the basic model is enhanced.
Finally, we evaluate our method on two benchmarks of KBQA: WebQSP and CWQ.Experimental results demonstrate that NumGNN plus Num-Transformer, serving as plugins of alternative IRbased KBQA models, can achieve substantial and consistent improvement (+2.4 -14.8% in terms of accuracy) on the ordinal constrained questions.

Related Work
Knowledge Base Question Answering.Methods for the KBQA task can be categorized into two groups: SP-based methods and IR-based methods.A detailed survey of the task can be referred to (Lan et al., 2021;Zhang et al., 2021).SP-based methods (Berant et al., 2013;Berant and Liang, 2014;Yih et al., 2015;Bao et al., 2016;Liang et al., 2017;Lan and Jiang, 2020) learn a semantic parser to convert natural language questions into logic queries, which are able to deal with ordinal constrained questions.However, they heavily rely on intermediate logic queries, which becomes the bottleneck of performance improvement.
IR-based methods (Bordes et al., 2015;Dong et al., 2015;Miller et al., 2016;Sun et al., 2018Sun et al., , 2019;;Saxena et al., 2020;He et al., 2021) directly retrieve answer candidates from the KBs and represent them to encode the semantic relationships with the questions.These methods are more faulttolerant, but are unable to deal with ordinal constrained questions.This paper aims to enhance the IR-based models for numerical reasoning.
Numerical Reasoning.Numerical Reasoning has been studied for various tasks such as word embedding (Naik et al., 2019;Wallace et al., 2019), arithmetic word problems (AWP) (Wang et al., 2018;Zhang et al., 2020), andMRC (Yu et al., 2018;Ran et al., 2019;Chen et al., 2020).Word embedding and AWP are a little far from our task.Similar to KBQA, MRC also aims to answer questions, but infers the answers from passages instead of KBs.To enable numerical reasoning, NumNet (Ran et al., 2019) adopts a numerically-aware GNN to encode numbers and QDGAT (Chen et al., 2020) further extends the number graph with additional words.However, they are all end-to-end models weakly supervised by the final answers.This paper studies the explicit supervision signals about numerical l 2 y 5 j M H 4 A e s z y 8 Y K q D Y < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " u m A K a H x r m w 2 u 7 1 2 8 K N p 4 d S + F g 2 h q W r N q x I W P 4 X / k 6 a l m 4 Z u 3 l j F m j 2 v I w c O w R E 4 A S a o g B q 4 B n X Q A B g 8 g m f w C t 6 0 J + 1 F m 2 j v s + i S N p 8 5 A D + g f X 4 B J p S g 3 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 M h q W r N q x I W P 4 X / k 6 a l m 4 Z u 3 l j F m j 2 v I w c O w R E 4 A S a o g B q 4 B n X Q A B g 8 g m f w C t 6 0 J + 1 F m 2 j v s + i S N p 8 5 A D + g f X 4 B J p S g 3 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 M h q W r N q x I W P 4 X / k 6 a l m 4 Z u 3 l j F m j 2 v I w c O w R E 4 A S a o g B q 4 B n X Q A B g 8 g m f w C t 6 0 J + 1 F m 2 j v s + i S N p 8 5 A D + g f X 4 B J p S g 3 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 M N q x I W P 4 X / k 6 a l m 4 Z u 3 l j F m j 2 v I w c O w R E 4 A S a o g B q 4 B n X Q A B g 8 g m f w C t 6 0 J + 1 F m 2 j v s + i S N p 8 5 A D + g f X 4 B J p S g 3 w = = < / l a t e x i t > prediction Figure 1: The whole reasoning process includes basic reasoning on the relation subgraph G q r and numerical reasoning on the attribute subgraph G q a .For numerical reasoning, we first perform the pretrained NumGNN and NumTransformer to infer value embeddings and then attach them into the entity embeddings learned by the basic reasoning.The final prediction is based on the entity embeddings. properties.

Method
In this section, we first introduce the ordinal constrained KBQA.Then the framework of our model is provided, followed by detailed descriptions of its components.

Problem Definition
A Knowledge Base G is the union of a relation graph G r and an attribute graph G a , where G r = {(e, r, e )} and G a = {(e, a, v)} with e(e ), r, a, and v denoting an entity, relation, attribute, and value respectively.Their initial embeddings e (0) , r (0) , v (0) , and a (0) are encoded by RoBERTa (Liu et al., 2019) based on their names.Attributes are divided into the numeric attributes and non-numeric attributes, where values of the former and the later ones are presented as numbers and texts respectively.Ordinal Constrained Question (Bao et al., 2016) denotes that the answers of such question should be selected from a ranked set based on ordinal determiners in the question as ranking criteria.This paper manually defines a list of ordinal determiners: first, last, latest, earliest, largest, biggest, most, least, warmest, tallest, highest, lowest, longest, shortest, according to (Lan and Jiang, 2020).Ordinal Constrained KBQA: Given an ordinal constrained question q, and the topic entity e q present in q, we aim to retrieve the question-aware relation graph G q r and attribute graph G q a from G, perform basic reasoning on G q r and numerical reasoning on G q a , and then extract the answer e t from the two graphs based on the fused entity embeddings in them.

Overall Framework
The framework of the proposed model is depicted in Figure 1.The reasoning process consists of basic reasoning on G q r and numerical reasoning on G q a .The former process infers entity embeddings that can encode the semantic relationships between entities and the question, regardless of the numerical properties.Meanwhile, the latter process infers the value embeddings by the pretrained NumGNN and NumTransformer modules, and attaches them into the entity embeddings derived by the basic reasoning module to complement the relative magnitude and ordinal properties of entities.

Number Pretraining (NumGNN)
We randomly build a large amount of number graphs from the given KB, upon which we perform GNN reasoning and optimize a number-aware triplet ranking loss to preserve the relative magnitude of numbers.Henceforth, we name a graph full of number nodes as a number graph and denote it as G n .
Number Graph Construction.In a number graph G n , the nodes are composed of the values belonging to the same numerical attribute extracted from the given KB, and the edges are directed with each one pointing from a larger number to a smaller number.

In other words, v
where n(v) denotes the number corresponding to the node/value v. Unlike the single "greater" edge, NumNet for MRC (Ran et al., 2019) builds both the "greater" and the "lower/equal" edges between nodes.As a result, NumNet needs to additionally incorporate weights to distinguish the effect of different relations during message passing in GNN.Given this, we only keep a single "greater" relation, as it can already distinguish the magnitude of numbers and make the latter GNN model simple.We also prove this by the empirical results shown in Figure 2(c).
We randomly sample a set of numerical attributes from the whole knowledge base G and extract the values of the same attributes to construct the number graphs.
Number Representation.Given a number graph G n , we use a GNN model to learn the number embeddings of the nodes by the following steps: (1) Node Initialization: Nodes in a number graph G n are initialized by the corresponding value embeddings {v (0) }.
(2) Message Passing: As we intend to preserve the relative magnitude between numbers, the role a number plays in reasoning should be affected by the surrounding numbers.Specifically, We propagate messages from each number to its neighbors by the following propagation function: where v j is the number embedding of v j and N n (i) is the neighbors of v i in G n .MLP mentioned in this paper is the abbreviation of multi-layer perceptron.
The weight α j is formulated as: where σ is the sigmoid function.
(3) Node Representation Update: The information carried by the neighbors is added with the node itself to update its representation: The above steps (2) and (3) are repeated L times, resulting in the number embeddings {v (L) } which preserve the relative magnitude between numbers.To be conveniently referred in the following sections, the entire NumGNN reasoning process (Eq.( 1)-( 3)) is denoted as a single function: Loss Function.We perform a number-aware triplet ranking loss for NumGNN optimization.Specifically, from each number graph G n , we randomly sample a set of triplets with each consists of three numbers and assume that the small number v s should be closer to the medium one v m than the big one v b .In other words, "v s ≺ v m ≺ v b " should be satisfied to reflect the relative distance between numbers rather than the absolute magnitude.We minimize the following triplet ranking loss to learn the parameters of NumGNN, i.e., where g is cosine similarity between two numbers, T is the set of the sampled triplets, and is a margin separating (v s , v m ) and (v s , v b ).

Number Pretraining (NumTransformer)
Based on the number embeddings output by NumGNN, we need further connect the numbers to the ordinal determiners to learn the ordinal properties of numbers.For example, we aim to make the embedding of 1 in "1 ≺ 2 ≺ 3" closer to the ordinal determiner "smallest" than 2 and 3.
To efficiently achieve the goal, we build a set of question-aware number graphs from the ordinal constrained QA pairs, upon which we pretrain NumTransformer and optimize a number prediction loss.Other datasets that can indicate the relationship between ordinal determiners and numbers could also be chosen for pretraining.
Question-aware Number Graph Construction.
For each ordinal constrained question q, we find the most relevant numerical attribute a t of the answer entity e t to q via measuring the cosine similarity between the attribute embeddings a (0) and the question embedding q (0) encoded by RoBERTa.
Then we retrieve v t in (e t , a t , v t ) as the ground truth value and sample other values of the same attribute a t as the negative instances.We restrict the negative instances within three-hops of the topic entity e q to avoid destroying the question-specific ordinal relationship.We construct a number graph G n by the ground truth and the negative values in the same way as Section 3.3.G n together with the question q compose a question-aware number graph pair (q, G n ).
Number Representation.Given a question-aware number graph pair (q, G n ), we apply NumGNN on G n by Eq. ( 4) to output the number embeddings {v (L) }.Then we concatenate them with the word embeddings h (0) q in the question q encoded by RoBERTa as the input of a transformer to update the number embeddings, i.e., where L is the size of the fully-connection layers in Transformer.Thanks to the multi-layer selfattention, the updated number embeddings v (L ) has fully interacted with the query words such that they can encode the ordinal semantics, e.g., a "largest" or "smallest" number.
Loss Function.Since the output number embeddings of NumTransformer are conjectured to encode the ordinal properties, we can predict the ground truth number based on its output embedding, and adopt cross-entropy loss to train Num-Transformer.The predictive probability of the ground truth number v t in G n is formulated as: ))

Basic Reasoning
We adopt the subgraph retrieval and reasoning scheme for basic reasoning.

Relation Subgraph Retrieval.
We follow GRAFT-Net (Sun et al., 2018) to extract the neighborhood relation triplets within two hops of the topic entity e q .To reduce the size of the triplets, we also perform the personalized PageRank (Haveliwala, 2002) to keep the most relevant entities to q.The resultant relation triplets compose the queryrelevant relation subgraph G q r .
Loss Function.The predictive probability of the answer e t is formulated as: p(e t |q, G q r ) = σ(MLP(e t )). (9) The cross-entropy loss is optimized on both ordinal and non-ordinal constrained questions.

Numerical Reasoning
We first retrieve an attribute subgraph G q a for q, then apply the pretrained NumGNN and NumTransformer (the parameters are frozen) to infer the value embeddings in G q a , which are then attached to entity embeddings in G q r for numerical reasoning.This process can be visualized as Figure 1.
Attribute Subgraph Retrieval.We extract the numerical attribute triplets for entities in G q r to compose the attribute subgraph G q a .More specifically, from all the numerical attributes of the entities in G q r , we extract the top-K attributes relevant to the question q by measuring the cosine similarity between the attribute embeddings and the question embedding, and add the attribute triplets {(h, a, v), h ∈ G q r } associated with these attributes into G q a .
Number Embedding Inference.The values in G q a compose multiple number graphs {G n }.Each G n is composed of the values of the same attributes and is built in the same way as Section 3.3.Their value embeddings are updated by the pretrained NumGNN in Eq. ( 4).Then they are concatenated with the question word embeddings as the input of the pretrained NumTransformer in Eq. ( 6) to be further updated.
Number Embedding Plugin.The updated numerical value embeddings {v (L ) } from G q a can be incorporated into the entity embeddings {e}, which is learned by the basic reasoning module on the relation graph G q r .Specifically, we aggregate the value embeddings by attentions associated with the neighborhood attributes of the i-th entity: where N a (i) is the i-th entity's attribute neighbors.a j and v j are the attribute embedding and the value embedding of the j-th neighbor respectively.The weight α j emphasizes the question-relevant values.Finally, we concatenate the updated entity embedding e i propagated from G q a with the corresponding entity embedding e i in G q r to compose the ordinal-aware entity embedding: 6 Retrieve a relation graph G q r for each q; 7 Train θBR and r by CrossEntropy on Eq. ( 9) and update {e}; / * Train Numerical Reasoning Module * / 8 Retrieve an attribute graph G q a for each q; 9 Build {(q, Gn)} from G q a ; 10 Apply NumGNN and NumTransformer to update{v}; 11 Attach v into corresponding e; 12 Train θBR, θNR, {r}, and {a} by CrossEntropy on Eq. ( 9) and Eq. ( 13) jointly.
Note e i is set to 0 if the i-th entity does not have numerical attributes.
Loss Function.The predictive probability of the answer e t is formulated as: p(e t |q, G q r , G q a ) = σ(MLP(e f t )). (13) The cross-entropy loss is optimized on the ordinal constrained questions.

Training & Prediction
The training process is presented in Algorithm 1. θ NG of NumGNN, θ NT of NumTransformer, θ BR of the basic reasoning module, θ NR of the numerical reasoning module, as well as the relation embeddings {r} and the attribute embeddings {a} are parameters to be optimized.Note the parameters in Eq. ( 8) for embedding entities are shared between θ BR and θ NR .The parameters in Eq. ( 9) for basic predicting and those from Eq. ( 10)-( 13) for numerical predicting are separated.
For each question q, we retrieve the relation subgraph G q r and the attribute subgraph G q a , predict the probability of each entity candidate in G q r by Eq. ( 13) if the question is ordinal constrained or by Eq. ( 9) otherwise.
Evaluation Metrics.We follow GRAFT-Net to rank candidate entities4 for each question by their predictive probabilities and then evaluate Hits@1 to reflect the accuracy of the top-1 prediction.
Baselines.We compare with three IR-based KBQA models: GRAFT-Net (Sun et al., 2018), EmbedKGQA (Saxena et al., 2020) and NSM (He et al., 2021).Compared with the Vanilla GNN, GRAFT-Net and NSM incorporate questions into graph convolution.EmbedKGQA directly optimizes the triplet of (topic entity, question, answer) based on their direct embeddings.PullNet (Sun et al., 2019)-the advanced GRAFT-Net-is not evaluated due to the unreleased code.

Implementation Details.
We construct a train/valid/test set of 10000/3000/4000 number graphs for NumGNN pretraining and a train/valid/test set of 500/60/80 question-aware number graphs for NumTransformer pretraining.Dataset of this scale is capable of capturing the ordinal relationships since the initial question word embeddings and the number embeddings have already been pretrained.The scale of a number graph in both NumGNN and NumTransformer is controlled within 2 to 150 nodes to balance the efficiency and the effectiveness.We unify the units of the same attribute and only compare the numbers belonging the same attribute.We extract top-K (K = 3) attributes relevant to q to build the attribute subgraph.
We run experiments on single Tesla V100 GPU with 32GB memory.Both number pretraining processes can be finished in 20 minutes.Take NSM as example, with our plugins, it takes around 850/76 seconds an epoch to train model on CWQ/WebQSP dataset.All the models are trained on the training set, selected on the validation set, and evaluated on the test set.Due to the scarce ordinal labels on the validation set of WebQSP (only 4 ordinal con-   els ignore the numerical attributes and values of entities, which apparently underperform the corresponding number-enhanced models.
The performance improvement on WebQSP is more significant than that on CWQ, as the questions on CWQ are more complex, which results in many mistaken reasoned entities, based on which the ordinal constraints are hard to be satisfied.

Ablation Study
We perform the below model variants on GRAFT-Net to investigate the effect of different components: +NumGNN: with non-pretrained NumGNN, meaning that NumGNN from scratch is trained end-toend with numerical reasoning.

Parameter and Embedding Analysis
NumGNN Layer Size L. Figure 2(a) presents the direct performance of the pre-trained NumGNN and the final ordinal constrained QA performance with various NumGNN layers.To evaluate the direct performance, we build a set of number graphs from the given KB in the same way as Section 3.3, and evaluate whether NumGNN can explicitly preserve the relative magnitude between the largest and the smallest numbers in each graph.Specifically, we reduce the number embeddings into 1dimensional scores, calculate the sign of score difference of the two numbers and compare it with the original sign, and finally evaluate the accuracy.
Considering both the direct accuracy and the final QA performance, the 2-layer NumGNN performs the best.Because 1-layer is too shallow to distinguish the number magnitude, while 3-layer over-smooths the number embeddings.
NumTransformer Layer Size L . Figure 2(b) presents the direct performance of NumTransformer and the final ordinal constrained QA performance with various NumTransformer layers.We evaluate the ability of predicting the right number corresponding to the ordinal determiner of the questions in the same way as Section 3.4.Considering both the direct and final evaluations, 2-layer NumTransformer performs the best, which is consistent with the layer selection of NumGNN.Due to the small amount of training data for NumTransformer, the model is sensitive to L .If the number of L is large, there will be too many parameters in the model and will lead to overfitting.While if the number of L is small, the parameters are not enough to capture the features of the training data and will cause underfitting.
NumGNN Graph Relation Type.We study whether the single "greater" relation in number graphs is enough to learn the numerical properties, compared with the multi-typed relations (including "greater", "equal" and "lower" types) defined by NumNet.We perform both the direct and final QA evaluations for the single-typed and multi-typed relations.The results in Figure 2(c) show that the direct performances are almost the same but the single-typed setting outperforms the multi-typed setting in terms of Hits@1 of the final ordinal QA.Moreover, considering that the multi-typed setting demands additional weights during graph convolution to distinguish the types' effect, the singletyped relation is a better choice in our model.
Number Embeddings.We visualize the reduced 1-dimensional scores of number embeddings in an example number graph in Figure 3.We can see that the relative magnitude between almost all the numbers can be maintained.Since the scores can only reflect the relative distance rather than the absolute magnitude, the absolute sort may be kept or reversed.In fact, more than 95% number graphs in our datasets can keep the relative magnitude between the largest and the smallest numbers, more than 35% can keep all the numbers' relative magnitude, which indicates NumGNN's capacity of encoding the relative magnitude.

Conclusion
The  can be enhanced.The experimental results on two benchmarks verify the effectiveness of our model.Other types of constraints, such as multiple topic entities, type and aggregation constraints, are to be explored in the future.
p 8 5 A D + g f X 4 B J p S g 3 w = = < / l a t e x i t > +NumGNN (Pretrained): NumGNN is first pretrained and then frozen with numerical reasoning.+Num: with non-pretrained NumGNN plus the non-pretrained NumTransformer.+Num (Pretrained): with pretrained NumGNN and pretrained NumTransformer.The results in Table3reflect 1) the effectiveness of both NumGNN and NumTransformer; 2) the positive guidance of the pretraining loss function for NumGNN and NumTransformer; 3) the inadequacy of the end-to-end QA supervision signals forNumGNN and NumTransformer.
Figure 2: Direct and final evaluations of (a) NumGNN or (b) NumTransformer with different layers; (c) Direct and final evaluations of NumGNN with different relation types.

Figure 3 :
Figure3: A case study of the number embeddings output by NumGNN .The upper and the lower bar present relative magnitude between the original numbers and between the reduced 1-dimensional scores respectively.

Which is the largest city in China 792 2,448 6,490 Transformer Basic Reasoning on Number Plugin Reason by NumGNN (Pretrained on ) Reason by NumTransformer (Pretrained on ) value embedding
e x i t > Question: Which is the largest city in China?

Table 1 :
Data statistics.#All/Ordinal QA pairs for training, validating and testing are presented.|G q r | and |G q a | are the average number of nodes in the retrieved relation subgraph G q r and attribute subgraph G q a respectively.Coverage and coverage(O) are the coverage rate of the answers by the subgraphs over all/ordinal QA pairs respectively.

Table 3
paper proposes a pretraining numerical reasoning model for ordinal constrained KBQA.Via pretraining by explicit supervision signals, NumGNN and NumTransformer are capable of capturing the magnitude and ordinal properties of numbers.By attaching them as plugins into any IR-based KBQA model, the numerical reasoning ability of the model