Uni-Parser: Unified Semantic Parser for Question Answering on Knowledge Base and Database

Parsing natural language questions into executable logical forms is a useful and interpretable way to perform question answering on structured data such as knowledge bases (KB) or databases (DB). However, existing approaches on semantic parsing cannot adapt to both modalities, as they suffer from the exponential growth of the logical form candidates and can hardly generalize to unseen data.In this work, we propose Uni-Parser, a unified semantic parser for question answering (QA) on both KB and DB. We define the primitive (relation and entity in KB, and table name, column name and cell value in DB) as the essential element in our framework. The number of primitives grows only at a linear rate to the number of retrieved relations in KB and DB, preventing us from exponential logic form candidates. We leverage the generator to predict final logical forms by altering and composing top-ranked primitives with different operations (e.g. select, where, count). With sufficiently pruned search space by a contrastive primitive ranker, the generator is empowered to capture the composition of primitives enhancing its generalization ability. We achieve competitive results on multiple KB and DB QA benchmarks with more efficiency, especially in the compositional and zero-shot settings.


Introduction
With the recent advances in deep neural networks, question answering (QA) systems enable users to interact with massive data using queries in natural language.However, it remains challenging to assess structured data, such as knowledge bases and databases.Semantic parsing is a core step of question answering for structured data.The goal is to convert a natural language question to an executable logical form (Berant et al., 2013;Yih et al., 2015), e.g., SQL for databases and S-expression for knowledge bases.
To improve the accuracy and faithfulness in execution of semantic parsing, recent KBQA studies propose to generate logical form candidates by enumerating and selecting the best logical form by ranking (Berant and Liang, 2014;Yih et al., 2015;Sun et al., 2020;Ye et al., 2021).However, the number of logical form candidates may grow exponentially with the increase of reasoning depth for complex questions.Thus this approach can suffer from poor runtime performance due to the timeconsuming logical form enumeration (Gu et al., 2021) and inefficient candidate ranking (Ye et al., 2021).For example, given an entity in a KB, we collect its logical form candidates by enumerating paths up to two hops.If the first hop with respect to the given entity contains N relations and the second hop contains M relations, the enumeration would result in N × M logical forms.As KBs typically contain massive structured knowledge, an entity can have hundreds of linked relations (Bollacker et al., 2008).Moreover, for complex questions, which require combining logic or aggregation operations, such as COUNT, ARGMIN, or ARGMAX, into the logical form, the situation could be even worse.Therefore, traditional enumeration methods may fail in cases requiring complicated reasoning (e.g., involving large amounts of entities and long reasoning chains).The situation becomes more severe and prohibitive if one wants to apply the enumeration method to other structured data with dense connections between entities such as in databases.Designing a unified semantic parsing method for various modalities of structured data have significant theoretical and practical value, yet it is still an understudied topic.
To avoid the problem of exponential growth in the logical form enumeration, we consider logical forms composed of two types of elements -primitives and operations.Primitives are defined by the schema of the structured data source and operations are a set of grammars associated with primitives.
For example, in the context of knowledge bases, primitives are defined as relations and entities in the knowledge graph.Whereas in databases, primitives are presented as tables, columns, and cells.Through this formulation, the number of candidates can be greatly reduced, going down from N × M to N + M .
In this study, we present Uni-Parser, a unified semantic parser for question answering on both knowledge bases (KBs) and databases (DBs).Our model follows the framework of Enumeration-Ranker-Generator proposed in RnG-KBQA (Ye et al., 2021).We first enumerate possible questionrelevant primitives of a given KB or DB.Then a cross-encoder ranker is utilized to select the best candidates with contrastive learning (Chang et al., 2020), and it is further enhanced by a special hard negative sampling strategy.After getting the top-k ranked primitives for each hop, we filter out the high-order primitives that cannot be reached from the KB or do not exist in the DB through selected low-order primitives.Next, we introduce a generator that consumes both the question and filtered top-k primitives with predicted operations to compose the final logical form.Starting from primitives rather than logical forms, our generator needs to understand the semantic meaning of each primitive to compose them into the logical form.
Our contributions can be summarized as follows: • We propose a unified semantic parser working for both KB and DB question answering.
• We enumerate primitives rather than logical forms, which greatly reduces the search space and makes candidate generation and ranking more efficient and scalable.
• The composition of logical forms from primitives and operations is postponed to the generation phase.Thus, the generator is required to learn the compositional relations among primitives.This leads to a more generalized model that can work on complex logical forms and generalize to questions involving unseen schema.
• Extensive empirical results on four KB and DB QA datasets demonstrate the effectiveness, flexibility and scalability of our unified framework.

Problem Formulation
Given a structured data source D and a question X in natural language, a semantic parser model is tasked to generate the corresponding logical form Y .Specifically, we illustrate the details de-  pending on the type of D as follows: (1) Knowledge Base: Data is stored in the form of subjectrelation-object (s, r, o), where s is an entity, r is a relation and o is an entity or a literal (e.g., integer values, data, etc.).We use S-expressions (Gu et al., 2021) to represent logical forms for KB.Sexpression is used to query a KB with the entity type, and the operations on the KB are treated as operations on a set of entities.This formulation greatly reduces the number of operations in traditional lambda DCS (Liang, 2013).
(2) Database: A DB QA dataset typically consists of multiple tables where L is the number of rows.For tabular data, SQL is used to represent logical forms (Yu et al., 2018).Logical forms can be decomposed into the primitives and operations, both of which are defined by the schema of the structured data.We define primitives as the atomic elements that are entities themselves or can be used to navigate to entities.Operations are a set of grammars to associate primitives.Thus, a logical form can be decomposed to primitives and operations.We list specific primitives and operations considered in this study in Table 1.

Methodology
Our model first obtains the relevant primitives from the given structured data and question, and sets them into different categories (Sec 3.1).Then a ranker filters out the irrelevant primitives (Sec 3.2) and provides the top-ranked primitives to the gen-  erator to produce the final logical form (Sec 3.3).We'll first explain how to extract the primitive candidates from KB or DB based on the question.

Primitive Enumeration
Rather than enumerating all possible logical forms as in RnG-KBQA (Ye et al., 2021), here we only enumerate primitives that are relevant to the question.
In cases of knowledge bases, we start by detecting the entity mentioned in the question with the help of an out-of-the-box NER system and then run fuzzy matching (Lin et al., 2020) with the entity names in the knowledge base to identify relevant entities.This technique is also used in (Gu et al., 2021;Chen et al., 2021).To alleviate the issue of entity disambiguation, we follow (Ye et al., 2021) to use a ranker model to select entity candidates based on the similarity between the question and the one-hop in/out relations of the entity.Nevertheless, since most questions contain two-hop reasoning in KBs, we also extract two-hop paths associated with question entities as a related sub-graph.We define <|firsthop|> category primitives as the entities with first-hop relation and likewise for <|secondhop|> category primitives (examples are shown at the top of Figure 1).
As for databases, we consider two formats of primitives.The first category is <|tb_cl|>, denoting the format table_name.column_name.The second one is <|tb_cl_vl|>, representing the format table_name.column_name<op> cell_value.The <op> represents a conditional operation as shown in Table 1.To enumerate the first category of primitives, we can simply use all table names together with their column names.However, for the second category, including cell values in enumeration will lead to a vast amount of candidates.For instance, on the Spider dataset (Yu et al., 2018), if we treat every cell value as a candidate, we can get up to 263K candidates for one question.To address this issue, following (Lin et al., 2020), we perform a fuzzy string match between question X and the cell value V under each column name C, and pair the matched value with its corresponding column name.One shortcoming of using string match is, that we can only obtain coverage of 15% for the Spider dataset.This is because many cell values are of numeric type, where string match fails to detect.For example, for the question "How many heads of departments are older than 56?", there is no cell value that matches 56.Therefore, if a question contains numbers, we pair them with all column names in the table.

Primitive Ranker
Our ranker model learns to filter out irrelevant primitives by measuring the similarity between questions and primitive candidates.We utilize the cross-encoder architecture (Chang et al., 2020) for ranker, which has shown to be more performant than bi-encoder architectures (Thakur et al., 2020;Lei et al., 2022).As shown in Figure 2, we use a special category token as a prompt in the input to differentiate primitives in the input.Note that all primitives from different categories share the same ranker.Specifically, given a question X and a primitive p with a category token p c , we use a BERTbased encoder that takes as input the concatenation of their vector representations, and outputs a logit representing the similarity between the primitive and the question: where ⊕ denotes a concatenation operation.ψ θ denotes the [CLS] representation of the concatenated input after BERT embedding; FNN is a projection layer reducing the representation to a scalar similarity score.p c is the special token to distinguish the category of the primitive (in KB, p c ∈ {<|firsthop|>, <|secondhop|>} and in DB, p c ∈ {<|tb_cl|>, <|tb_cl_vl|>}).
The ranker is optimized to minimize the contrastive loss: where p + is the positive primitive extracted from the gold logical form and P − is the set of negative primitives from the same category p c .

Negative Sampling
Since a large number of negative primitive candidates can be paired with a positive example, it is necessary to apply negative sampling.A straightforward way for this is random sampling, however it may suffer from the domination of uninformative negatives (Xiong et al., 2020).To this end, we design a strategy to sample hard negative candidates for training the ranker (Liu et al., 2021b)  model is trained recursively using the false positive candidates generated from the last training epoch.

Primitive Candidates Filtering
In KBs, the top ranked first hop primitives and second hop primitives can be formed into two-hop paths by combining one first hop primitive with each of the second hop.However, the resulting paths may not exist in the KB.To provide valid primitive candidates to the generator, we filter out the second hop primitives that cannot be reached from any of the first hop primitives.

Logical Form Generation by Composing Primitives
In the final stage, we need to employ a generator model to predict the target logical form by composing the top primitive candidates provided by the ranker.We use a T5 model (Raffel et al., 2020) as the basis of our logical form generator, as it demonstrates strong performance on various text generation tasks.We construct the inputs by concatenating the question and the top-k primitive candidates.As shown in Figure 3, the input for KBs is formatted as: [X; <|second_hop|> primitives ; <|first_hop|> primitives].As for DBs, its input is formatted as: We train the model by teacher forcing -the target logical form is generated token by token and the model is optimized with the loss of cross-entropy.At inference time, we use beam-search to decode top-k target logical forms in an autoregressive manner.
During training, it is often the case that the ranker performs well and predicts the gold primitive at the top place, but this positional information can be misused by the generator.If the generator is biased by the ranking of primitives, it may gener-alize poorly to unseen data.In order to encourage the generator to focus on the semantic meaning of each primitive rather than their positions, we shuffle the order of primitives in the input during training.According to our experiments, we find that the generator can benefit from this shuffle augmentation and perform robustly against the noise in the ranked primitives.
Our generator learns to generate logical forms by understanding the meaning of its elementsprimitives and operations -and composing them.Compared to RnG-KBQA (Ye et al., 2021), whose generator predicts logical forms based on a list of logical form candidates, the compositionality on the basis of primitives and operations can make our model likely to generalize better on unseen structured data.

Dataset and Evaluation
KBQA (1) GRAILQA (Gu et al., 2021) contains 64,331 questions and carefully splits the data to evaluate three levels of generalization in the task of KBQA, including i.i.d.setting, compositional generalization to unseen composition of KB schema, and zero-shot generalization to unseen KB schema.The fraction of each setting in the test set is 25%, 25%, and 50%, respectively.(2) WebQSP (Yih et al., 2016) is a dataset that evaluates KBQA approaches in i.i.d.setting.It contains 4,937 questions and requires reasoning chains with up to 2 hops.Similar to Ye et al. (2021), we randomly sample 200 examples from the training set for validation.DBQA (1) Spider (Yu et al., 2018) is a multitable text-to-SQL dataset, which contains 10,181 questions and 5,693 complex SQL queries on 200 databases.There is no overlap between train/dev/test databases.(2) WikiSQL (Zhong et al., 2017) is a single-table text-to-SQL dataset, which contains 80,654 questions and SQL queries distributed across 24,241 tables from Wikipedia.49.6% of its dev tables and 45.1% of its test tables are not in the training set.Therefore, both datasets require models to generalize to the unseen schema composition (compositional generalization) and unseen schema (zero-shot generalization).
Evaluation Metrics.We use their official evaluation script for each dataset with two metrics to measure logical form of program exact match accuracy (EM) and answer accuracy (F1).

Results on KBQA
We first test our approach on KBQA with the GrailQA and WebQSP datasets.

Implementation Details
For GrailQA and WebQSP, we use the entity linking results provided by (Ye et al., 2021).After identifying a set of entities, we extract the primitives within 2 hops from the question entities.We initiate the primitive ranker using BERT-base-uncased.For each primitive category, 96 negative candidates are sampled.We trained the ranker for 3 epochs using a learning rate of 1e-5 and a batch size of 8. Bootstrap sampling is applied after every epoch.It is also noteworthy that we perform teacher-forcing when training the ranker, i.e., we use ground truth entity linking for enumerating training candidates.We base our generation model on T5-base (Raffel et al., 2020).We use top-10 primitives from each category returned by the ranker and finetune the T5 generation model for 10 epochs using a learning rate of 3e-5 and a batch size of 8.A vanilla T5 generation model is used without syntactic constraints, which does not guarantee the syntactic correctness nor executability of the produced logical forms.Therefore, we use an execution-augmented inference procedure, which is commonly used in previous semantic parsing related work (Devlin et al., 2017;Ye et al., 2020).We first decode top-k logical forms using beam search and then execute each logical form until finding one that yields a valid (non-empty) answer.In case none of the top-k logical forms is valid, the top-ranked primitives obtained using the ranker is returned.The rule-based method is used to formulate the final logical form, which is guaranteed to be executable.This inference schema can ensure finding one valid logical form for each problem.

Overall Evaluation
Table 2 and 3 summarize the results on GrailQA and WebQSP, respectively.Our approach achieves the highest overall performance among all approaches.Compared with the methods enumerating logical forms like Bert Ranking and RnG-KBQA, our approach achieves better performance in compositional and zero-shot settings.Especially, we get 3.1% improvement over baselines on F1 on dev set and 1.3% improvement over baselines on EM on test set in the compositional setting.This matches our expectation that our generator learns the composition of the primitives.RnG-KBQA enumerates Overall I.I.D Compositional Zero-Shot EM F1 EM F1 EM F1 EM F1 Bert Ranking(Test) (Gu et al., 2021) 50.6 58.0 59.9 67.0 45.5 53.9 48.6 55.7 ReTrack (Test) (Chen et al., 2021) 58.1 65.3 84.4 87.5 61.5 70.9 44.6 52.5 UnifiedSKG (Test) (Xie et al., 2022)  EM F1 Topic Units (Lan et al., 2019) -67.9 STAGG (Yih et al., 2015) 63.9 71.7 QGG (Lan and Jiang, 2020) -74.0 CBR (Das et al., 2021) 70.0 72.8 ReTrack (Chen et al., 2021) -71.0 RNG-KBQA (Ye et al., 2021)  logical forms rather than primitives.Therefore its generation module acts more like an auxiliary input to complement the enumerated logical forms.However, the rationale behind our generation module is to compose logical forms with basic semantic units(primitives).Thus, with the sense of primitive composition, our model is more capable of dealing with unseen composition than RnG-KBQA.For i.i.d.setting, our approach underperforms RnG-KBQA by 0.7% on EM and 0.5% on F1.We speculate that our model needs to understand whether a question implies a one or two hops reasoning on the KB, which is a difficult task.But the generator of RnG-KBQA already sees logical form candidates in the input, it does not need to deal with this problem.
sampled questions on GrailQA datasets.We also report the average running time per question on an A100 GPU.Our model uses 19.4s, which is considerably faster than BERT+Ranking (76.3s) and RnG-KBQA (53.5s).Unlike logical form-based models, our model doesn't need to enumerate a large amount of logical forms.Instead, only a small number of relevant primitives are considered, which leads to faster tokenization and efficient ranking.

Results on DBQA
We also evaluate our approach to the DBQA task with Spider and WikiSQL datasets.

Implementation Details
To construct the <|tb_cl_vl|> category primitives mentioned in Section 3.1, we find the relevant cell EM F1 SQLova (Hwang et al., 2019) 80.7 86.2 X-SQL (He et al., 2019) 83.3 88.7 IE-SQL (Ma et al., 2020) 84.6 88.8 NL2SQL (Guo and Gao, 2019) 83.7 89.2 HydraNet (Lyu et al., 2020) 83.8 89.2 BRIDGE(Large) (Lin et al., 2020) 85.7 91.1 TAPEX (Liu et al., 2021a) -89.5 Uni-Parser(T5-Base) 85.8 91.3 Uni-Parser(T5-Large) 86.9 92.1 values related to the question.Given a question and DB, we compute the string matching between the arbitrary length of phrase in question and the list of cell values under each column of all tables.We followed (Lin et al., 2020) to use a fuzzy matching algorithm to match a question to a possible cell value mentioned in the DB.We also detect the number value in the question and form all column names with the value as the primitives.We find that the column name in the WikiSQL dataset is vague, like "No.", "Pick #", and "Rank", so we use the cell value to supplement the meaning of the column name.We use the matching cell value to locate the row and match the column name with the cell value in the same row.
We initiate the primitive ranker using BERTbase-uncased.We sample 48 negative candidates for each primitive category.We trained the ranker for 10 epochs using a learning rate of 1e-5 and a batch size of 8. Bootstrap hard negative sampling is conducted after every two epochs.We also use ground truth entity linking for enumerating training candidates.For the generator, we trained it using T5-base and 3B on Spider datasets.We use top-15 <|tb_cl|> category primitives and top-5 <|tb_cl_vl|> category primitives returned by the ranker and finetune the T5-base model for 200 epochs using a learning rate of 5e-5 and a batch size of 64.For the T5-3B model, we run it on 16 A100 GPUs with 100 epochs using a batch size of 1024.And on the Wik-iSQL dataset, we use T5-base and T5-large, and use top-5 <|tb_cl|> category primitives and top-3 <|tb_cl_vl|> category primitives as the input of the generator.We finetune the T5-base/large model for 20 epochs using a learning rate of 3e-5 and a batch size of 16.

Overall Evaluation
Table 4 and 5 summarize the results on Spider and WikiSQL respectively.On the challenging Spider dataset, our model achieves competitive performance among all baseline models.Compared with generation models that use whole DB table schema as input like BRIDGE and UnifiedSKG on T5-base models, our model achieves 3% improvement, suggesting the advantage of our method.During primitive enumeration and ranking, we filter out irrelevant candidates based on the DB table schema.Compared with the other T5-3B models, our model achieves comparable performance with fewer training epochs where T5-3B* trained 3K epochs, while we train only 100 epochs.For Wik-iSQL, we compare the Text2SQL methods and answer generation method (TAPEX) in Table 5, and Uni-Parser outperforms all the baselines.

Analysis Ablation Study
We perform an ablation study on the effects of hard negative strategies in ranking on WebQSP and WikiSQL, and the result is shown in Table 6.No CG means that we do not differentiate primitives by their categories.The intention of this design is to help the ranker to distinguish whether a question has two or one hop reasoning.No CG shows lower performance than the setting using categories in the input.By comparing the settings of with and without hard negative (rightmost two columns), we can see that the proposed hard negative sampling can help the ranker to better determine the positive primitive from the negative ones.Moreover, the accuracy of entity linking is WebQSP is 72.5, which means the upper bound of the ranking stage.Therefore, our model achieves 69.0 among Top-10 primitive is close to the oracle performance.Candidate Size To better understand the benefit of enumerating primitives in reducing the size of candidates, we compare the numbers of enumerated primitives and logical forms on both KB and DB.In Table 7, we show that in KBQA datasets, the number of our primitives is three times less than the number of logical forms used in previous SO-TAs (Ye et al., 2021;Gu et al., 2021).In DB where the number of operations is usually larger, this efficiency advantage is more obvious.While a logical form in KB is usually composed of no more than two entities and two hops, a logical form in DB is generally more complex with some uncertainty and the number of primitives and operations in a logical form would be larger.Specifically, many frequently used operations can appear in one logical form in SQL, like SELECT, WHERE, ORDER BY, GROUP BY, JOIN ON, etc and each of them has a unique functionality.In our model, we only need to enumerate two types of primitives (tb_cl, tb_cl_vl) , which are sufficient to generate the complex logical form.As a result, the number of primitives in DB is 30 to 40 times less than that of logical forms.This shows the benefit of enumerating primitives in Uni-Parser is universal across both KB and DB despite their significant differences in structure and complexity.

Dataset
Case study For a more intuitive understanding of our model, we show a concrete example to illustrate the results of our model and the logical form enumeration based model RnG-KBQA (Ye et al., 2021) in Figure 4.The top-5 ranked logical forms in RnG-KBQA contain much redundant information, and none of them equals the gold S-expression.In contrast, our output from the ranker is simple and includes the ingredient of the gold s-expression.The output of the ranker is used as the input to the generator.The generator output of the RnG-KBQA is as same as the top-1 logical form from the ranker.This indicates that their generator is more like a correctness rewriter that performs minor edits on the input candidates.In comparison, our model can find the correct first and second hop primitives and generate the correct logical form.It's worth noticing that even though proper primitive is not ranked as the top-1, our generator has the capability to find the correct.

Related Work
We focus on semantic parsing rather than directly getting the answer, as semantic parsing is more explainable (Zhang and Balog, 2020;Lin et al., 2020).Many papers have applied seq2seq models to solve semantic parsing in either KB (Gu and Su, 2022;Ye et al., 2021) or Table scenarios (Dong and Lapata, 2016;Lin et al., 2018), treating it as a translation problem that takes a natural question as input and outputs a logical form.
Semantic Parsing on KBQA Past works have attempted to generate logical forms using a grammarbased bottom-up parser (Berant et al., 2013;Pasupat and Liang, 2015) or a seq2seq network (Hao et al., 2017;Zhang et al., 2019a).An alternative approach is to produce a list of logical form candidates and then use a ranker to find the ones that best match the intent of the question (Lan and Jiang, 2020;Sun et al., 2020;Luo et al., 2018).Ye et al. (2021) further employs a generation stage beyond the rank to remedy or supplement existing logical form candidates. Rather than enumerating the complete logical form, our Uni-Parser only gets the relevant primitives, which greatly improves the parser's efficiency and compositional generalization ability.Recently, ArcaneQA proposed a generation-based model with dynamic programming induction in the KB search space to improve the faithfulness of the generated programs (Gu and Su, 2022) but it is still not as accurate as the rank-based model.DE-CAF (Yu et al., 2022) jointly generates both logical forms and direct answers, which help them leverage both KB and text to get better final answers.
Semantic Parsing on DBQA (Text2SQL) Text2SQL models take both the natural language question and the database table schema as input (Dou et al., 2022;Zhang et al., 2020b).To get the sequential version of the table schema, prior work commonly linearizes the input as a table name followed by all the column names.Lin et al. (2020); Zhang et al. (2020a) further show that using table content as supplemental information in Seq2Seq model can provide a better understanding of the table schema.Moreover, it supports the prediction of the conditional part in the logical form as mentioned in (Yavuz et al., 2018).Shaw et al. (2020) first shows that the pre-trained Seq2Seq model (Raffel et al., 2020) with 3 Billion parameters achieves competitive performance on the Spider dataset.(Scholak et al., 2021) proposes a constrained decoding method that can be compatible with various large pre-trained language models and achieves promising performance on Spider.
Unified Question Answering Many unified QA models convert the structured (KB) or semistructured data (DB) to unstructured texts, which provides additional information for missing knowledge in the open-domain textual QA by directly lin-earizing the structured schema into text Oguz et al. (2020); Xie et al. (2022); Tay et al. (2022).Ma et al. (2021) further uses the data-to-text generator to revise the linearized schema into natural language.Li et al. (2021) proposes a hybrid QA model that either answers questions using text or generates the SQL queries from table schema on the textual and Tabular QA datasets.Our Uni-Parser works in a different direction that efficiently parses the questions into the executable logical forms on both KB and DB in a unified framework.

Conclusion
For unified semantic parsing on both KB and DB structured data, we propose Uni-Parser, which has three modules: primitive enumeration, ranker, and compositional generator.Our enumeration at the primitive level rather than the logical-form level produces a smaller number of potential candidates, leading to high efficiency in the enumeration and ranker steps.Moreover, training a generator to produce the logical form from the primitives leads to a more generalized and robust compositional generator.Experimental results on both KB and DB QA demonstrate the advantages of Uni-Parser, especially in the compositional and zero-shot settings.

Acknowledge
The authors would like to thank the members of Salesforce AI Research team for fruitful discussions, as well as the anonymous reviewers for their helpful feedback.

Limitations
The current Uni-Parser model needs to independently train on each of the datasets.In this work, we test it on four datasets.But if having more datasets, this process will be very time costing.A more unified way is having one model trained on all the datasets, either from KB or DB, once and producing a good performance on each dataset.
The other limitation is that the current model needs to indicate whether the question is from KB or DB.This makes the model hard to be applied to reality where which source can answer the question is unknown.Those limitations are challenging and we leave them for further explorations.

Figure 1 :
Figure1: Given the question and its knowledge source, the primitive enumeration process produces different categories of primitives for KB and DB.

Q:Figure 4 :
Figure 4: Ranker output (shown in dotted boxes) and Generator output (shown in Final output) comparison between our primitive-based method Uni-Parser and logical form-based method RnG-KBQA on GrailQA dev set.Our model generates the correct output while the logical form-based model produces a wrong output that is the same as the top-1 ranked output.

Table 1 :
Primitives and Operations in KB and DB logical form.
. In cases of KBs, the number of second hop relations can grow exponentially compared to the first hop.Thus the hard negative candidates of the second hop can only be sampled from the primitives connected to the ground truth first hop.In cases of DBs, for <|tb_cl|> category primitives, we treat those having the same table name with ground truth but different column names as the hard negatives.And for the <|tb_cl_vl|> category, we treat candidates with the same table and column name with ground truth, but having a different cell value as the hard negatives.Moreover, the bootstrap negative sampling strategy is leveraged; that is, the

Table 2 :
Exact match (EM) and F1 scores on test/dev split of the GRAILQA.The numbers of the baselines are taken from leaderboard and their research works.The reported models are based on BERT-base model for ranker and T5-base for generator.Best results among dev are bolded and the results of test better than dev are underlined.

Table 3 :
Exact match (EM) and F1 scores on the test split of WebQSP.The reported models are based on BERT-base model for ranker and T5-base for generator.

Table 5 :
Exact match (EM) and F1 scores on the test split of WikiSQL.

Table 6 :
The Recall of the top-K ranked primitive on KB and Table.CG means category, HN means Hard Negative.

Table 7 :
The average numbers of candidate logical form and two types of primitives in each dataset.LF represents logical form