A Graph-Guided Reasoning Approach for Open-ended Commonsense Question Answering

Recently, end-to-end trained models for multiple-choice commonsense question answering (QA) have delivered promising results. However, such question-answering systems cannot be directly applied in real-world scenarios where answer candidates are not provided. Hence, a new benchmark challenge set for open-ended commonsense reasoning (OpenCSR) has been recently released, which contains natural science questions without any predefined choices. On the OpenCSR challenge set, many questions require implicit multi-hop reasoning and have a large decision space, reflecting the difficult nature of this task. Existing work on OpenCSR sorely focuses on improving the retrieval process, which extracts relevant factual sentences from a textual knowledge base, leaving the important and non-trivial reasoning task outside the scope. In this work, we extend the scope to include a reasoner that constructs a question-dependent open knowledge graph based on retrieved supporting facts and employs a sequential subgraph reasoning process to predict the answer. The subgraph can be seen as a concise and compact graphical explanation of the prediction. Experiments on two OpenCSR datasets show that the proposed model achieves great performance on benchmark OpenCSR datasets.


INTRODUCTION
Commonsense reasoning has long been considered an essential topic in artificial intelligence.Most approaches work on the setting of multiple-choice question answering (Lin et al., 2019;Feng et al., 2020), which selects an answer choice by scoring the question-choice pairs.However, the multiplechoice setting is not applicable in many real-world scenarios since many question-answering tasks do not provide answer candidates.As a step towards making commonsense reasoning research more realistic and useful, open-ended commonsense reasoning (OpenCSR) has been introduced (Lin et al., 2020), which explores a commonsense knowledge corpus to answer commonsense questions.OpenCSR often requires multi-hop reasoning, i.e., the model should conclude the answer by reasoning over two or more facts from the knowledge corpus, which makes this task much more challenging.Lin et al. (Lin et al., 2020) proposed a retrieval-based method, called DrFact, by combining the maximum inner product search and symbolic links between facts.However, DrFact does not put much effort on the reasoning module to re-rank the retrieved facts.To this end, we proposed an integrated subgraph reasoning approach for OpenCSR with end-to-end learning, which iteratively employs a retriever to extract question-relevant facts from a knowledge corpus and a reasoner over the extracted facts.Given a commonsense question, the proposed approach applies DPR (Karpukhin et al., 2020) to extract relevant facts from a textual knowledge corpus, converts the retrieved natural language facts into a graph-structured format using Open Information Annotation (OIA) Sun et al. (2020) and performs subgraph reasoning on the constructed joint OIA-graph using a multi-relational graph attention network.Specifically, the reasoner first performs entity linking from the giving question to the joint OIA-graph.Then it starts from the linked entities (nodes), and iteratively samples relevant edges with a pruning procedure to form an enclosing subgraph around the question.The reasoning procedure takes into account both structural information, i.e., graph structure of the joint OIA graph,  and semantic information, i.e., language representation of questions and facts.After several rounds of retrieval and pruning, the model predicts the answer from the concepts in the subgraph.
Our contributions are as follows: (1) we investigate how to perform a cooperative retrieval-andreasoning in open-ended commonsense question answering.To the best of our knowledge, our work is the first retrieve-and-reasoning approach for OpenCSR.(2) We present experimental results that show our model achieves great results on the benchmark OpenCSR dataset with an ablation study demonstrating the performance gain of integration structural information and semantic information.(3) The proposed method can potentially homogenize structured, i.e., knowledge base, and unstructured commonsense knowledge, i.e., textual corpus for answering open-ended commonsense questions since it can unify both knowledge formats into a graph-structured format.

RELATED WORK
Commonsense Reasoning Traditional commonsense reasoning (CSR) techniques are mainly designed for multiple-choice QA.For instance, to independently score each decision, KagNet (Lin et al., 2019) and MHGRN (Feng et al., 2020) both leverage external commonsense knowledge graphs as structural priors.Although effective in selecting the best response for a multiple-choice question, these techniques are less useful for real-world situations because answer candidates are frequently unavailable.By fine-tuning a text-to-text transformer, UnifiedQA (Khashabi et al., 2020) generated answers to questions.However, a drawback of multiple-choice QA models is that they do not provide intermediate explanations for their answers, making them less suitable in many real-world scenarios.Lin et al. (Lin et al., 2020) introduced the open-ended commonsense reasoning and proposed DrFact to directly retrieve relevant facts, and then use the concepts mentioned in the top-ranked facts as answer predictions.
Subgraph Reasoning Many recent works learn representations of localized subgraphs.Alsentzer et al. (Alsentzer et al., 2020) introduced a subgraph neural network to learn disentangled subgraph representations using a novel subgraph routing mechanism.Teru et al. (Teru et al., 2020) proposed a graph neural network that reasons over local subgraph structures and performs inductive relation predictions.Han et al. (Han et al., 2020a) developed an explainable reasoning framework for forecasting future links on temporal knowledge graphs by employing a sequential reasoning process over local subgraphs.

OUR APPROACH
Retrieving Relevant Facts Following the dense passage retrieval work (Karpukhin et al., 2020), we use a bi-encoder transformer architecture that learns to maximize the inner product of the representation of a question and the relevant factual sentences from the knowledge corpus containing correct answers to the given question.
Constructing Question-dependent Joint OIA-Graph Following the steps in (Sun et al., 2020), we convert each retrieved factual sentence into an OIA-graph as shown in Figure 1a.For each node in an OIA graph, we link it with nodes in the OIA graphs of other sentences that include the same concept.We label this kind of link as shared concepts.As shown in Figure 1b, the factoid sentences "Plants supply the fungi with carbohydrates, in return, making it a symbiotic relationship."and "Fungi participate in symbiotic relationships to obtain their food."shares the same concepts "fungi" and "symbiotic relationship".Then, we construct a joint OIA-graph G joint by linking nodes that share the same concepts in different OIA graphs.
Subgraph Reasoning on the Joint OIA-Graph Inspired by Han et al. (2020b), we conduct reasoning on a dynamically expanded inference graph G inf extracted from the joint OIA-graph.Given a commonsense question q, we build an initial inference graph via entity linking between the question q and the joint OIA-graph.We find all nodes of G joint that share the same concepts as q includes.We set such OIA-nodes to be the initial nodes of the inference graph G inf .The inference graph expands by sampling one-hop neighbors of initial nodes in G joint .Besides, we propose a semantic-following operation to build skip connections between the initial nodes and their multi-hop neighbors.Taking a node v in G inf as an example, we compute the inner-product similarity between its representation and the representation of other nodes in G joint obtained by the retrieval and add the top K nodes into G inf by linking them with v.The contribution of the semantic-following has two folds: 1) It speeds up the reasoning process and broadens the receptive field of the subgraph reasoner by adding skip connections between multi-hop neighbors; 2) It allows the subgraph reasoner to take into account both semantic-relevant and symbolic-linked nodes regarding a given question.Next, we feed G inf into a relational graph attention layer that takes node embedding as the input, computes an attention score for each edge indicating the relevance to the given question, and produces a question-dependent representation for each node using message passing.Instead of treating all neighbors with equal importance in the massage passing, we take the question information into account and assign varying importance levels to each neighbor by calculating the following question-dependent attention score: where e l vu (q, p k ) is the attention score of the edge (v, p k , u) regarding the question q, p k corresponds to the edge type between the source node v and the target node u, W l s and W l t are two weight matrices for capturing the dependencies between question representations and source node features specified for source node and target node, respectively.p k is the edge embedding indicating the relationship between u and v. h l−1 v denotes the hidden representation of the node v at the (l − 1) th inference step.When l = 1, i.e., for the first layer, h 0 v is the aggregated token representation from Bert.Then, we compute the normalized attention score α l vu (q, p k ) using the softmax function.Once obtained, we aggregate the representations of the sampled neighbors of node v denoted as Nv and weight them using the normalized attention scores, which are written as Answer Prediction We compute the plausibility score s l v,q of node v to be the answer of question q at the l th inference step as follows: Datasets ARC-Open OBQA-Open Model H@50 H@100 R@50 R@100 H@50 H@100 R@50 R@100 Table 1: Results of the Hit@K and Rec@K (K=50/100) in % on OpenCSR.
where s mips (q, f v ) denotes the relevance score of the retrieved fact f v , which mentions the node v, regarding the question q by the maximum inner product search.Since the same concept may appear in different nodes in the inference graph, we aggregate the plausibility score of nodes that share the same concept to assign each concept a unique attention score: where s l ci,q denotes the plausibility score of concept c i , V G inf is the set of nodes in inference graph G inf .v(c) represents the concept included in node v, and g(•) represents a score aggregation function.
Here we use the maximum function.
Inference Graph Expansion and Pruning After several iterations of expansion, the inference graph G inf would grow rapidly and cover almost all nodes.To prevent the inference graph from exploding, we reduce the graph size by pruning the edges with a small plausibility score and keeping the edges with K largest contribution scores.After running L inference steps, the model selects the concept with the highest plausibility score in G inf as the answer to the given question, where the inference graph itself serves as a graphical explanation.
Loss Function We use the binary cross-entropy as the loss function, which is ))), where C inf q represents the set of concepts in the inference graph of the question q, y ci,q represents the binary label that indicates whether c i is an answer for q, and Q denotes the question set.s L ci,q denotes the plausibility score of concept c i at the final inference step.

EXPERIMENTS
Fact corpus and concept vocabulary Following settings in Lin et al. ( 2020), GenericsKB-Best corpus serves as the main commonsense knowledge source that contains 1,025,413 unique facts.All sentences in the corpus are provided with concepts, which are frequent noun chunks, using the spaCy toolkit.There are 80,524 concepts in total.
Datasets and evaluation metrics We evaluate our model on two benchmark open-ended commonsense reasoning datasets, i.e., ARC-Open and OBQA-Open Lin et al. (2020), that contain 6600 and 5288 questions, separately.Every question could be answered using various concepts, where the average answer is 6.8 and 7.7 in ARC-Open and OBQA-Open.Each dataset provides the set of true answer concepts for each question.We use two metrics, Hits@K and Recall@K, where Hits@K denotes the percentage of times that at least one true concept appears in the top k of ranked concepts.

Experimental Results
We compare our model with DPR Karpukhin et al. (2020), DrKIT Dhingra et al. (2020), andDrFact Lin et al. (2020).Recall that our model applies DPR as the retriever so it is a straightforward baseline.And DrFact is the strongest baseline in OpenCSR.As shown in Table 1, our model outperforms DPR and DrKIT on ARC-Open and achieves on-par performance as DrFact.All results are averaged over three trials.We provide implementation details in Appendix C and attach the source code in the supplementary material.

Model
H@50 H@100 R@50 R@100 Ablation Study Recall that the proposed subgraph reasoner takes into account both the structurally linked one-hop neighbors and semantically relevant multi-hop neighbors by expanding the inference graph G inf .Table 2 shows an ablation study in that we disable the reasoner to add semantically relevant multi-hop neighbors while inference graph expansion called Model w/o SC, demonstrating the performance gain of integrating both structural and semantic information.

CONCLUSION
We present a novel graph-guided neural symbolic commonsense reasoning approach for the openended commonsense reasoning task.The proposed method takes advantage of the dense passage retrieval and graph neural network reasoner to answer open-ended commonsense questions.Specifically, the graph reasoner integrates both structural dependency information between facts and semantic information by constructing an open information annotation graph and employing a semantic-following operation.The proposed model generates an inference graph for each question, which can be seen as a concise and compact graphical explanation of the prediction.The model achieved great performance on two benchmark datasets while being more interpretable.
Figure 1: (a) The OIA graph of "Plants supply the fungi with carbohydrates, in return, making it a symbiotic relationship."There are two types of nodes: constant and predicate.Constant nodes are simple nominal phrases while predicate nodes include simple verbal phrases and prepositional phrases.Edges in OIA graphs are labeled.pred.arg.ndenotes the n-th arguments of a predicate node, mod indicates the modification, and as:pred.arg.nexpresses an reversed relation of pred.arg.n.(b)The joint OIA graph consists of two factoid sentences that share the concepts "fungi" and "symbiotic relationship".

Table 2 :
Ablation Study on ARC-Open: we investigate the gain of adding skip connections (SC) to semantically relevant multi-hop neighbors.