Leveraging Abstract Meaning Representation for Knowledge Base Question Answering

Knowledge base question answering (KBQA)is an important task in Natural Language Processing. Existing approaches face significant challenges including complex question understanding, necessity for reasoning, and lack of large end-to-end training datasets. In this work, we propose Neuro-Symbolic Question Answering (NSQA), a modular KBQA system, that leverages (1) Abstract Meaning Representation (AMR) parses for task-independent question understanding; (2) a simple yet effective graph transformation approach to convert AMR parses into candidate logical queries that are aligned to the KB; (3) a pipeline-based approach which integrates multiple, reusable modules that are trained specifically for their individual tasks (semantic parser, entity andrelationship linkers, and neuro-symbolic reasoner) and do not require end-to-end training data. NSQA achieves state-of-the-art performance on two prominent KBQA datasets based on DBpedia (QALD-9 and LC-QuAD1.0). Furthermore, our analysis emphasizes that AMR is a powerful tool for KBQA systems.


Introduction
Knowledge base question answering (KBQA) is a sub-field within Question Answering with desirable characteristics for real-world applications. KBQA requires a system to answer a natural language question based on facts available in a Knowledge Base (KB) (Zou et al., 2014;Vakulenko et al., 2019;Diefenbach et al., 2020;Abdelaziz et al., 2021). Facts are retrieved from a KB through structured queries (in a query language such as SPARQL), which often contain multiple triples that * Equal contribution, correspondence to Pavan Kapanipathi (kapanipa@us.ibm.com), Ibrahim Abdelaziz (ibrahim.abdelaziz1@ibm.com), Srinivas Ravishankar (srini@ibm.com) represent the steps or antecedents required for obtaining the answer. This enables a transparent and self-explanatory form of QA, meaning that intermediate symbolic representations capture some of the steps from natural language question to answer.
With the rise of neural networks in NLP, various KBQA models approach the task in an end-to-end manner. Many of these approaches formulate textto-query-language as sequence-to-sequence problem, and thus require sufficient examples of paired natural language and target representation pairs. However, labeling large amounts of data for KBQA is challenging, either due to the requirement of expert knowledge (Usbeck et al., 2017), or artifacts introduced during automated creation (Trivedi et al., 2017). Real-world scenarios require solving complex multi-hop questions i.e. secondary unknowns within a main question and questions employing unusual expressions. Pipeline approaches can delegate language understanding to pre-trained semantic parsers, which mitigates the data problem, but are considered to suffer from error propagation. However, the performance of semantic parsers for well-established semantic representations has greatly improved in recent years. Abstract Meaning Representation (AMR) (Banarescu et al., 2013;Dorr et al., 1998) parsers recently reached above 84% F-measure (Bevilacqua et al., 2021), an improvement of over 10 points in the last three years.
In this paper we propose Neuro-Symbolic Question Answering (NSQA), a modular knowledge base question answering system with the following objectives: (a) delegating the complexity of understanding natural language questions to AMR parsers; (b) reducing the need for end-to-end (textto-SPARQL) training data with a pipeline architecture where each module is trained for its specific sub-task; (c) facilitating the use of an independent reasoner via an intermediate logic form.   Figure 1: Real NSQA prediction for the sentence Which actors starred in Spanish movies produced by Benicio del Toro?. In underlined, we show the representation for the two unknown variables across all stages including: AMRaligned tokens in sentence (Which, movies), AMR graph (unknown, movie), paths representation (same as AMR), logical representation (actor, movie) and SPARQL interpretation (?actor, ?movie). Displayed stage outputs: AMR (green), Entity Linking (blue), Relation Linking (orange)

Query Graph
The contributions of this work are as follows: • The first system to use Abstract Meaning Representation for KBQA achieving state of the art performance on two prominent datasets on DBpedia (QALD-9 and LC-QuAD 1.0).
• A novel, simple yet effective path-based approach that transforms AMR parses into intermediate logical queries that are aligned to the KB. This intermediate logic form facilitates the use of neuro-symbolic reasoners such as Logical Neural Networks (Riegel et al., 2020), paving the way for complex reasoning over knowledge bases.
• A pipeline-based modular approach that integrates multiple, reusable modules that are trained specifically for their individual tasks (e.g. semantic parsing, entity linking, and relationship linking) and hence do not require end-to-end training data.

Approach Overview
Figure 1 depicts the pipeline of our NSQA system. Given a question in natural language, NSQA: (i) parses questions into an Abstract Meaning Representation (AMR) graph; (ii) transforms the AMR graph to a set of candidate KB-aligned logical queries, via a novel but simple graph transformation approach; (iii) uses a Logical Neural Network (LNN) (Riegel et al., 2020) to reason over KB facts and produce answers to KB-aligned logical queries. We describe each of these modules in the following sections.

AMR Parsing
NSQA utilizes AMR parsing to reduce the complexity and noise of natural language questions. An AMR parse is a rooted, directed, acyclic graph. AMR nodes represent concepts, which may include normalized surface symbols, Propbank frames (Kingsbury and Palmer, 2002) as well as other AMR-specific constructs to handle named entities, quantities, dates and other phenomena. Edges in an AMR graph represent the relations between concepts such as standard OntoNotes roles but also AMR specific relations such as polarity or mode. As shown in Figure 1, AMR provides a representation that is fairly close to the KB representation. A special amr-unknown node, indicates the missing concept that represents the answer to the given question. In the example of Figure 1, amr-unknown is a person, who is the subject of act-01. Furthermore, AMR helps identify intermediate variables that behave as secondary unknowns. In this case, a movie produced by Benicio del Toro in Spain.
NSQA utilizes a stack-Transformer transitionbased model (Naseem et al., 2019;Astudillo et al., 2020) for AMR parsing. An advantage of transition-based systems is that they provide explicit question text to AMR node alignments. This allows encoding closely integrated text and AMR input to multiple modules (Entity Linking and Relation Linking) that can benefit from this joint input.

AMR to KG Logic
The core contribution of this work is our next step where the AMR of the question is transformed to a query graph aligned with the underlying knowledge graph. We formalize the two graphs as follows: AMR graph G is a rooted edge-labeled directed acyclic graph V G , E G . The edge set E G consists of non-core roles, quantifiers, and modifiers. The vertex set V G ∈ amr-unknown ∪ A P ∪ A C where where A P are set of propbank predicates and A C are rest of the nodes. 1 . Propbank predicates are nary with multiple edges based on their definitions. amr-unknown is a special concept node in the AMR graph indicating wh-questions.
Further, we enrich the AMR Graph G with explicit links to entities in the KG. For example, the question in Figure 1 contains two entities Spain and Benicio Del Toro that need to be identified and linked to DBpedia entries, dbr:Spain and dbr:Benicio del toro. Linking these entities is absolutely necessary for any KBQA system (Zou et al., 2014;Vakulenko et al., 2019). To do so, we trained a BERT-based neural mention detection model and used BLINK (Devlin et al., 2018) for disambiguation. The entities are linked to AMR nodes based on the AMR node-text alignment information. The linking is a bijective mapping from V e → E where V e is the set of AMR entity nodes, and E is the set of entities in the underlying KG.
Query graph Q is a directed edge-labeled graph V Q , E Q , which has a similar structure to the underlying KG. V Q ∈ V E ∪ V where V E is a set of entities in the KG and (V) is a set of unbound variables. E Q is a set of binary relations among V Q from the KG. The Query Graph Q is essentially the WHERE clause 2 in the SPARQL query.
Our goal is to transform the AMR graph G into its corresponding query graph Q. However such transformation faces the following challenges: N-ary argument mismatch: Query graph Q represents information using binary relations, whereas AMR graph contain Propbank framesets that are n-ary. For example, the node produce-01 3 from A P in G has four possible arguments, whereas its corresponding KG relation in Q (dbo:producer) is a binary relation. Structural and Granular mismatch: The vertex set of the query graph Q represent entities (or unbound variables). On the other hand, AMR Graph G contains nodes that are concepts or Prop-Bank predicates which can correspond to both entities and relationships. For example in Figure 1, produce-01, star-01, and Spain are nodes in the AMR graph. So the AMR graph G has to be transformed such that nodes primarily correspond to entities and edges (edge labels) correspond to relationships. Furthermore, it is possible for multiple predicates and concepts from G to jointly represent a single binary relation in Q because the underlying KG uses a completely different vocabulary. An example of such granular mismatch is shown in Figure 2.

Path-based Graph Transformation
We address the challenges mentioned above by using a path-based approach for the construction of Query Graphs. In KBQA, query graphs (i.e. SPARQL queries) constrain the unknown variable based on paths to the grounded entities. In Figure 1, the constraints in the SPARQL query are based on paths from ?actor to dbr:Benicio del toro and dbr:Spain as shown below.
• ?actor → dbo:starring → ?movie → dbo:country → dbr:Spain • ?actor → dbo:starring → ?movie → dbo:producer → dbr:Benicio del Toro Based on this intuition of finding paths from the unknown variable to the grounded entities, we have developed a path-based approach depicted in Algorithm 1 that shows the steps for transforming the AMR Graph G into Query Graph Q. As amr-unknown is the unknown variable in the AMR Graph, we retrieve all shortest paths (line 11 in Algorithm 1) between the amr-unknown node and the nodes V E of the AMR Graph G that have mappings to the KG entity set. Figure 1 shows an example of both the AMR and query graph for the question "Which actors starred in Spanish movies produced by Benicio del Toro?" Selecting the shortest paths reduces the n-ary predicates of AMR graph to only the relevant binary edges. For instance, the edge (act-01, arg0, person) in the AMR graph in Figure 1 will be ignored because it is not in the path between amr-unknown and any of the entities dbr:Spain and dbr:Benicio del Toro. Structural and granularity mismatch between the AMR and query graph occurs when multiple nodes and edges in the AMR graph represent a single relationship in the query graph. This is shown in Figure 2. The path consists of one AMR node and 2 edges between amr-unknown and cocoa bean: (amr-unknown, location, pay-01, instrument, cocoa-bean) 4 . In such cases, we collapse all nodes that represent predicates (like pay-01, star-01, etc.) into an edge, and combine it with surrounding edge labels, giving (location | pay-01 | instrument). This is done by line 18 of Algorithm 1 where the eventual query graph Q will have one edge with merged predicated from AMR graph G between the nonpredicates (A C ).
Returning to the example in Figure 1, Algorithm 1 (line 25) outputs the query graph Q with the following two paths, which bear structural similarity to the knowledge graph: • amr-unknown → star-01 → movie → produce-01 → Benicio del Toro Note that in the above paths, edge labels reflect the predicates from the AMR graph (star-01, produce-01, and mod). Our next step is to resolve these edge labels to its corresponding relationships from the underlying KG. To do so, we perform relation linking as described below. Relationship Linking.
NSQA uses Sem-REL (Naseem et al., 2021), a state-of-the-art relation linking system that takes in the question text and AMR predicate as input and returns a ranked list of KG relationships for each triple. The cartesian product of this represents a ranked list of candidate query graphs, and we choose the highestranked valid query graph (a KG subgraph with unbound variables). As shown in Figure 1, the output of this module produces query graph Q with star-01 and produce-01 mapped to DBpedia rela-tions dbo:starring and dbo:producer. This will be the WHERE clause of the final SPARQL query.

Logic Generation
Our query graph can be directly translated to the WHERE clause of the SPARQL. We use existential first order logic (FOL) as an intermediate representation, where the non-logical symbols consist of the binary relations and entities in the KB as well as some additional functions to represent SPARQL query constructs (e.g. COUNT). We use existential FOL instead of directly translating to SPARQL because: (a) it enables the use of any FOL reasoner which we demonstrate in our next Section 2.3; (b) it is compatible with reasoning techniques beyond the scope of typical KBQA, such as temporal and spatial reasoning; (c) it can also be used as a step towards query embedding approaches that can handle incompleteness of knowledge graphs (Ren and Leskovec, 2020;. The Query Graph from Section 2 can be written as a conjunction in existential first order logic as shown in Figure 1. The current logic form supports SPARQL constructs such as SELECT, COUNT, ASK, and SORT which are reflected in the types of questions that our system is able to answer in Table 4. The heuristics to determine these constructs from AMR are as follows: Query Type: This rule determines if the query will use the ASK or SELECT construct. Boolean questions will have AMR parses that either have no amr-unknown variable or have an amr-unknown variable connected to a :polarity edge (indicating a true/false question). In such cases, the rule returns ASK, otherwise it returns SPARQL. Target Variable: This rule determines what unbound variable follows a SPARQL statement. As mentioned in Section 2, the amr-unknown node represents the missing concept in a question, so it is used as the target variable for the query. The one exception is for questions that have an AMR predicate that is marked as imperative, e.g. in Figure  3 (middle) a question beginning with "Count the awards ..." will have count-01 marked as imperative. In these cases, the algorithm uses the arg1 of the imperative predicate as the target variable (see Algorithm 1, line 3).
Sorting: This rule detects the need for sorting by the presence of superlatives and quantities in the query graph prior to relation linking. Superlatives are parsed into AMR with most and least nodes and quantities are indicated by the PropBank frame have-degree-91, whose arguments determine: (1) which variable in V represents the quantity of interest, and (2) the direction of the sort (ascending or descending).
Counting: This rule determines if the COUNT aggregation function is needed by looking for Prop-Bank frame count-01 or AMR edge :quant connected to amr-unknown, indicating that the question seeks a numeric answer. However, questions such as "How many people live in London?" can have :quant associated to amr-unknown even though the correct query will use dbo:population to directly retrieve the numeric answer without the COUNT aggregation function. We therefore exclude the COUNT aggregation function if the KB relation corresponding to :quant or count-01 has a numeric type as its range.

Reasoner
With the motivation of utilizing modular, generic systems, NSQA uses a First Order Logic, neurosymbolic reasoner called Logical Neural Networks (LNN) (Riegel et al., 2020). This module currently supports two types of reasoning: type-based, and geographic. Type-based reasoning is used to eliminate queries based on inconsistencies with the type hierarchy in the KB. On the other hand, a question like "Was Natalie Portman born in United States?" requires geographic reasoning because the entities related to dbo:birthPlace are generally cities, but the question requires a comparison of countries. This is addressed by manually adding logical axioms to perform the required transitive reasoning for property dbo:birthPlace. We wish to emphasize that the intermediate logic and reasoning module allow for NSQA to be extended for such complex reasoning in future work.

Experimental Evaluation
The goal of the work is to show the value of AMR as a generic semantic parser on a modular KBQA system. In order to evaluate this, we first perform an end-to-end evaluation of NSQA (Section 3.2). Next, we discuss some qualitative and quantitative results on the value of AMR for different aspects of our KBQA system (Section 3.3). Finally, in support of our modular architecture, we evaluate the individual modules that are used in comparison to other state of the art approaches (Section 3.4).  Table 1: NSQA performance on QALD-9 and LC-QuAD 1.0

Datasets and Metrics
To evaluate NSQA, we used two standard KBQA datasets on DBpedia. QALD -9 (Usbeck et al., 2017) dataset has 408 training and 150 test questions in natural language, from DBpedia version 2016-10. Each question has an associated SPARQL query and gold answer set. Table 4 shows examples of all the question types in the QALD dev set. LC-QuAD 1.0 (Trivedi et al., 2017) is a dataset with 5,000 questions based on templates and more than 80% of its questions contains two or more relations. Our modules are evaluated against a random sample of 200 questions from the training set. LC-QuAD 1.0 predominantly focuses on the multirelational questions, aggregation (e.g. COUNT) and simple questions from Table 4. Dev Set. We also created a randomly chosen development set of 98 QALD-9 and 200 LC-QuAD 1.0 questions for evaluating individual modules. Metrics. We report performance based on standard precision, recall and F-score metrics for the KBQA system and other modules. For the AMR parser we use the standard Smatch metric (Cai and Knight, 2013).

End-to-end Evaluation
Baselines: We evaluate NSQA against four systems: GAnswer (Zou et al., 2014), QAmp (Vakulenko et al., 2019), WDAqua-core1 (Diefenbach et al., 2020), and a recent approach by (Liang et al., 2021). GAnswer is a graph data-driven approach and is the state-of-the-art on the QALD dataset. QAmp is another graph-driven approach based on message passing and is the state-of-theart on LC-QuAD 1.0 dataset. WDAqua-core1 is knowledge base agnostic approach that, to the best of our knowledge, is the only technique that has been evaluated on both QALD-9 and LC-QuAD 1.0 on different versions of DBpedia. Lastly, Liang et al. (Liang et al., 2021) is a recent approach AMR3.0 QALD-9 LC-QuAD 1.0 stack-Transformer 80.00 87.91 84.03 that uses an ensemble of entity and relation linking modules and train a Tree-LSTM model for query ranking.
Results: Table 1 shows the performance of NSQA compared to state-of-the-art approaches on QALD and LC-QuAD 1.0 datasets. On QALD-9 and LC-QuAD 1.0, NSQA achieves state-of-the-art performance. It outperforms WDAqua and gAnswer on QALD-9. Furthermore, NSQA's performance on LC-QuAD 1.0 significantly outperforms QAmp by 11.45 percentage points on F1. Due to difference in evaluation setup in Liang et al. (2021), we reevaluated their system on the same setup and metrics as the above systems. Given the test set and the evaluation, (Liang et al., 2021)'s F1 score reduces to 29.2% 5 . We exclude this work from our comparison due to lack of standard evaluation.

Performance Analysis of AMR
AMR Parsing. We manually created AMRs for the train and dev sets of QALD and LC-QuAD 1.0 questions. The performance of our stacktransformer parser on both of these datasets is shown in Table 2. The parser is trained on the combination of human annotated treebanks and a synthetic AMR corpus. Human annotated treebanks include AMR3.0 and 877 questions sentences (250 QALD train + 627 LC-QuAD 1.0 train sentences) annotated in-house. The synthetic AMR corpus includes 27k sentences obtained by parsing LC-QuAD 1.0 and LC-QuAD 2.0 (Dubey et al., 2019) training sentences, along the lines of (Lee et al., 2020). AMR-based Query Structure NSQA leverages many of the AMR features to decide on the correct query structure. As shown in Section 2.2.2, NSQA relies on the existence of certain PropBank predicates in the AMR parse such as have-degree-91, count-01, amr-unknown to decide on which SPARQL constructs to add. In addition, the AMR parse determines the structure of the WHERE clause.
In Table 3   these rules on LC-QuAD 1.0 dev dataset. Overall, NSQA identified 64% of ASK (boolean) questions correctly and achieved more than 80% accuracy for COUNT and SELECT questions. Using AMR and the path-based approach, NSQA was able to correctly predict the total number of constraints with comparable accuracies of 79% and 70% for single and two-hops, respectively. NSQA finds the correct query structure for complex questions almost as often as for simple questions, completely independent of the KG. Figure 3 shows two examples illustrating how AMR lends itself to an intuitive transformation to the correct query graph, as well as a third example where we fail. Here the AMR semantic parse can not be matched to the underlying KG, since 'side' is an extra intermediate variable that leads to an additional constraint in the query graph. Supported Question Types. Table 4 shows the reasoning and question types supported by NSQA . Our transformation algorithm applied to AMR parses supports simple, multi-relational, countbased, and superlative question types. LNN performs geographic reasoning as well as type-based reasoning to rank candidate logic forms. Addressing comparative and temporal reasoning is a part of our future work.

Individual Module Evaluation
Entity and Relation Linking. NSQA's EL module (NMD+BLINK) consists of a BERT-based neural mention detection (NMD) network, trained on LC-QuAD 1.0 training dataset comprising of 3,651 questions with manually annotated mentions, paired with an off-the-shelf entity disambiguation model -BLINK (Wu et al., 2019b). We compare the performance of NMD+BLINK approach with Falcon (Sakor et al., 2019) in Table 5. NMD+BLINK performs 24% better on F1 than Falcon (state-of-the-art) on LC-QuAD 1.0 dev set and 3% better on QALD-9 dev set. Similarly, we evaluate Relation Linking on both QALD and LC-QuAD 1.0 dev sets. In particular, we used Sem-REL (Naseem et al., 2021); state-of-the-art relation linking approach which performs significantly better compared to both Falcon (Sakor et al., 2019) and SLING (Mihindukulasooriya et al., 2020) on various datasets. On LC-QuAD 1.0 dev, SemREL acheives F1 = 0.55 compared to 0.43 by SLING and 0.42 by Falcon. On QALD-9, SemREL achieves 0.54 compared to 0.64 and 0.46 F1 for SLING and Falcon, respectively. Reasoner. We investigate the effect of using LNN as a reasoner equipped with axioms for type-based and geographic reasoning. We evaluated NSQA's performance under two conditions: (a) with an LNN reasoner with intermediate logic form and (b) with a deterministic translation of query graphs to SPARQL. On LC-QuAD 1.0 dev set, NSQA achieves an F1 score of 40.5 using LNN compared to 37.6 with the deterministic translation to SPARQL. Based on these initial promising results,

Related Work
Early work in KBQA focused mainly on designing parsing algorithms and (synchronous) grammars to semantically parse input questions into KB queries (Zettlemoyer and Collins, 2007;Berant et al., 2013), with a few exceptions from the information extraction perspective that directly rely on relation detection (Yao and Van Durme, 2014;Bast and Haussmann, 2015). All the above approaches train statistical machine learning models based on human-crafted features and the performance is usually limited. Deep Learning Models. The renaissance of neural models significantly improved the accuracy of KBQA systems (Yu et al., 2017;Wu et al., 2019a). Recently, the trend favors translating the question to its corresponding subgraph in the KG in an end-to-end learnable fashion, to reduce the human efforts and feature engineering. This includes two most commonly adopted directions: (1) embedding-based approaches to make the pipeline end-to-end differentiable (Bordes et al., 2015;Xu et al., 2019); (2) hard-decision approaches that generate a sequence of actions that forms the subgraph (Xu et al., 2018;Bhutani et al., 2019).
On domains with complex questions, like QALD and LC-QuAD, end-to-end approaches with harddecisions have also been developed. Some have primarily focused on generating SPARQL sketches (Maheshwari et al., 2019;Chen et al., 2020) where they evaluate these sketches (2-hop) by providing gold entities and ignoring the evaluation of selecting target variables or other aggregation functions like sorting and counting. (Zheng and Zhang, 2019) generates the question subgraph via filling the entity and relationship slots of 12 predefined question template. Their performance on these datasets show significant improvement due to the availability of these manually created templates. Having the advantage of predefined templates does not qualify for a common ground to be compared to generic and non-template based approaches such as NSQA, WDAqua, and QAmp.
Graph Driven Approaches. Due to the lack of enough training data for KBQA, several systems adopt a training-free approach. WDAqua (Diefenbach et al., 2017) uses a pure rule-based method to convert a question to its SPARQL query. gAnswer (Zou et al., 2014) uses a graph matching algorithm based on the dependency parse of question and the knowledge graph. QAmp (Vakulenko et al., 2019) is a graph-driven approach that uses message passing over the KG subgraph containing all identified entities/relations where confidence scores get propagated to the nodes corresponding to the correct answers. Finally, (Mazzeo and Zaniolo, 2016) achieved superior performance on QALD-5/6 with a hand-crafted automaton based on human analysis of question templates. A common theme of these approaches, is that the process of learning the subgraph of the question is heavily KG specific, while our approach first delegates the question understanding to KG-independent AMR parsing.
Modular Approaches. Frankenstein (Singh et al., 2018) is a system that emphasize the aspect of reusuability where the system learns weights for each reusuable component conditioned on the questions. They neither focus on any KG-independent parsing (AMR) not their results are comparable to any state of the art approaches. (Liang et al., 2021) propose a modular approach for KBQA that uses an ensemble of phrase mapping techniques and a TreeLSTM-based model for ranking query candidates which requires task specific training data.

Discussion
The use of semantic parses such as AMR compared to syntactic dependency parses provides a number of advantages for KBQA systems. First, independent advances in AMR parsing that serve many other purposes can improve the overall performance of the system. For example, on LC-QUAD-1 dev set, a 1.4% performance improvement in AMR Smatch improved the overall system's performance by 1.2%. Recent work also introduces multilingual and domain-specific (biomedical) AMR parsers, which expands the possible domains of application for this work. Second, AMR provides a normalized form of input questions that makes NSQA resilient to subtle changes in input questions with the same meaning. Finally, AMR also transparently handles complex sentence structures such as multi-hop questions or imperative statements.
Nevertheless, the use of AMR semantic parses in NSQA comes with its own set of challenges: 1) Error propagation: Although AMR parsers are very performant (state-of-the-art model achieves an Smatch of over 84%), inter-annotator agreement is only 83% on newswire sentences, as noted in (Banarescu et al., 2013). Accordingly, AMR errors can propagate in NSQA's pipeline and cause errors in generating the correct answer, 2) Granularity mismatch: our proposed path-based AMR transformation is generic and not driven by any domain-specific motivation, but additional adjustments to the algorithm may be needed in new domains due to the different granularity between AMR and SPARQL 3) Optimization mismatch: Smatch, the optimization objective for AMR training, is sub-optimal for KBQA. NSQA requires a particular subset of paths to be correctly extracted, whereas the standard AMR metric Smatch focuses equally on all edge-node triples. We are therefore exploring alternative metrics and how to incorporate them into model training.

Conclusion and Future Work
To the best of our knowledge, NSQA is the first system that successfully harnesses a generic semantic parser, particularly AMR, for a KBQA task. Our path-based approach to map AMR to the underlying KG such as DBpedia is first of its kind with promising results in handling compositional queries. NSQA is a modular system where each modules are trained separately for its own task, hence not requiring end-to-end KBQA training. In future, we will explore the potential of the more expressive intermediate logic form with the neurosymbolic reasoner for KBQA. Particularly, we intend to focus on extending NSQA for temporal reasoning and making it robust to handle incompleteness and inconsistencies in knowledge bases.