Multi-Hop Open-Domain Question Answering over Structured and Unstructured Knowledge

Open-domain question answering systems need to answer question of our interests with structured and unstructured information. However, existing approaches only select one source to generate answer or only conduct reasoning on structured information. In this paper, we propose a Document-Entity Heterogeneous Graph Network, referred to as DEHG, to effectively integrate different sources of information, and conduct reasoning on heterogeneous information. DEHG employs a graph constructor to integrate structured and unstructured information, a context encoder to represent nodes and question, a heterogeneous information reasoning layer to conduct multi-hop reasoning on both information sources, and an answer decoder to generate answers for the question. Experimental results on HybirdQA dataset show that DEHG outperforms the state-of-the-art methods.


Introduction
Open-domain question answering (ODQA) is a task to answer any form of question in general domains with provided evidence (Chen and Yih, 2020;Sun et al., 2019Sun et al., , 2018b. The evidence that is used can be categorized into unstructured text like Wikipedia passages (Yang et al., 2018;Min et al., 2020;Izacard and Grave, 2021) and structured data like WikiData/WikiTables (Pasupat and Liang, 2015;Chen et al., 2020b;Feng et al., 2022). In practice, an ideal ODQA model should be able to analyze evidence from both unstructured text and structured data sources, as both types of evidence have their own advantages: 1) the unstructured text covers more general domains; 2) the structured data has better explainability to solve complex multi-hop reasoning.
One line of research accesses unstructured text and structured data independently (Sun et al., 2019;Xiong et al., 2019;Pan et al., 2021;Eisenschlos et al., 2021). The input question is sent to unstructured text system (TextQA) and structured knowledge base system (KBQA), and one of them is selected to output the final answer. These methods cannot combine the two sources of information properly. Recently, a new line of research aggregates heterogeneous information to find the answer (Chen et al., 2020b), which can construct connection between passages and table data. However, the method only conducts multi-hop reasoning on table data. It is difficult to handle questions that need to be answered when multi-hop reasoning on both sources is required.
In this work, we propose a novel Document-Entity Heterogeneous Graph Network (referred to as DEHG) for open-domain question answering which can conduct multi-hop reasoning on aggregated heterogeneous information. DEHG comprises a graph constructor to integrate heterogeneous information sources, a context encoder to generate representations for nodes and question, a heterogeneous information reasoning layer to explore multi-hoop connectivity of both information sources, and an answer decoder to generate answers for the question.
Our contributions can be summarized as follows: (1) we examine how to homogenize structured and unstructured knowledge in open-domain question answering for multi-hop reasoning. To the best of our knowledge, our work is the first to conduct multi-hop reasoning on integrated heterogeneous information in open-domain question answering.
(2) We propose a Document-Entity Heterogeneous Graph Network to analyze complex relation of heterogeneous information in open-domain question answering.
(3) We present experimental results that show DEHG outperforms previous state-of-the-art on HybirdQA dataset. We also perform an ablation study of our model to provide further insights.  In order to cope with heterogeneous information, we propose a Document-Entity Heterogeneous Graph Constructor to enable rich heterogeneous information interaction. We divide the graph building process into two phases and describe them separately below: Linking: This phase is aimed to link questions to their related information in tables and passages from two sources: 1) Table Cell Matching: in order to link related table cells to the question, we follow these three criteria: the table cell's value is explicitly mentioned by the question; the table cell's value is greater/less than the mentioned value in question; the table cell's value is maximum/minimum over the whole column if the question involves superlative words. 2) Passage Matching: it aims to link cells implicitly mentioned by the question through its hyperlinked passage. The linking model is a TF-IDF retriever with 3-gram lexicon which calculates the distances with all the passages in the pool and highlight the ones with distance lower than a threshold.
Building: this phase is aimed to build a heterogeneous graph to connect all linked cells and their corresponding hyperlinked passages. The structure of a heterogeneous graph is shown in Figure  1. For a heterogeneous graph G = (V, E), V and E denote the set of nodes and the set of edges in the graph. The nodes V consist of the set of cells V C , and the set of phrases of hyperlinked passages V P . The edges E have three types, Cell-Cell edges E cc that reflect the relations between cells, Cell-Phrase edges E cp that describe the hyperlinked relation between cell and phrase, and Phrase-Phrase edges E pp that express the semantic relation between phrases in the passage.
We utilize Open Information Annotation (OIA) , which is a predicatefunction-argument annotation system for texts, to split passage into phrases and obtain the relation between phrases. Cells are connected to root phrase of its corresponding hyperlinked passage. All selected cells are connected to transfer information between cells on the heterogeneous graph.

Context Encoder
We use a BERT encoder to generate representations for every table cell, phrase of passage, and question as the initial node embedding in DEHG.
Each linked cell is encoded by 4-element tuple (CONTENT, LOCATION, SOURCE, SCORE). CONTENT represents the string representation in the

Heterogeneous Information Reasoning
Message passing: we define how information propagates over the graph in order to do reasoning over DEHG. According to the types of edges, the heterogeneous graph can be divided into three subgraphs: Cell-Cell subgraph, Cell-Phrase subgraph, and Phrase-Phrase subgraph. In each subgraph, we follow the message passing design in GCN (Kipf and Welling, 2017) to discriminate the importance of neighbors. To fuse the information of all subgraphs, we use the question-based attention to learn the corresponding weight of different subgraphs. With the learned weights as coefficients, we can fuse these subgraph embeddings to produce the finial node embedding.
Information Propagation: To explore the higherorder connectivity information of cells and passages, we stack T layers of subgraph representation and subgraph integration. Each layer k takes the node embedding from the previous layers as input, and outputs the updated node embedding after the current diffusion process finishes. The updated node embeddings are sent to the k + 1 layer for the next diffusion process.

Answer Decoder
The state decoder sequentially generates the answer for the given question, which is represented as a sequence of pointers to cells of the tables and tokens of the passages. The pointers point to the nodes in the heterogeneous graph.
The state decoder is an LSTM using pointer (Vinyals et al., 2015) and attention (Bahdanau et al., 2015). It takes nodes semantic representations as input. At each decoding step t, the decoder receives the embedding of the previous item w t1 , the utterance context vector c t , and the previous hidden state h t1 , and produces the current hidden state h t , (1) We adopt the attention function in (Bahdanau et al., 2015)to calculate the context vectors as follows, The decoder then generates a pointer from the set of pointers in the cells in the table and the phrases in the passages on the basis of the hidden state h t . Specifically, it generates a pointer of item w according to the following distribution, where w is the pointer of node w, n w is the representation of node w, v, W 1 , and W 2 are trainable parameters, and softmax is calculated over all possible pointers.

Dataset
We evaluate our multi-hop reasoning model DEHG on the HybridQA (Chen et al., 2020b)

Baselines
In the following experiments, we compare our approach against previously published state-of-the-art approaches on the HybridQA dataset.
HyBrider (Chen et al., 2020b): A hybrid model that combines heterogeneous information to find the answer. Unsupervised-QG (Pan et al., 2021): An unsupervised framework that can generate questions by first selecting/generating relevant information from each data source. DocHopper : A multihop retrieval method that retrieves a paragraph or sentence. Pointer (Eisenschlos et al., 2021): A Transformer architecture that uses heads to attend to either rows or columns in a table.

Evaluation Measures
We use the following automatic evaluation metrics in our experiments. Exact Match (EM): Measures what part of the predicted knowledge span matches the ground truth factoid exactly. Token-Level F1: We treat the predicted spans and ground truth factoids as bags of tokens, and compute F1.

Implementation Details
We use the pre-trained BERT model ([BERT-Base, Uncased]), which has 12 hidden layers of 768 units and 12 self-attention heads to encode cell, phrase, and question. The hidden size of LSTM decoder is also 768. The dropout probability is 0.1. We also use beam search for decoding, with a beam size of 5. The batch size is set to 4. Adam (Kingma and Ba, 2015) is used for optimization with an initial learning rate of 1e-4. We implement the algorithm using the PaddlePaddle Deep Learning Platform (Ma et al., 2019).

Experimental Results
In Table 1, we show the results of the our proposed DEHG graph based model on both development and test set and compare it with previously published results. It shows that our proposed DEHG works significantly better than the baselines in terms of EM and F1 on HybridQA. The results indicate that DEHG is really a general and effective model for multi-hop question answering over tabular and textual data. Specifically, DEHG can leverage the cell and phrase for question answering. It can also effectively handle multi-hop reasoning on the heterogeneous graph.  Effect of BERT: To investigate the effectiveness of using BERT in the context encoder, we replace BERT with Bi-directional LSTM and run the model on HybridQA. As shown in Figure 2, the performance of the BiLSTM-based model DEHGw/oBert in terms of EM and F1 decreases compared with DEHG. It indicates that the BERT-based context encoder can create and utilize more accurate representations for tabular and textual data and question understanding.

Effect of Heterogeneous Information Reasoning:
To investigate the effectiveness of using the heterogeneous graph, we compare DEHG with DEHG-w/oGraph which eliminates the heterogeneous information graph, and DEHG-w/oMulti-hop which removes the multi-hop information propagation. From Figure 2, one can observe that without the heterogeneous information graph the performances deteriorate considerably. In addition, the performances of DEHG-w/oGraph are inferior to DEHGw/oMulti-hop. Thus, utilization of heterogeneous graph to representation multi-hop relation between passages and tables is desirable.
Effect of Pointer Decoder: To investigate the effectiveness of the pointer generation mechanism, we directly generate words from the vocabulary instead of generating pointers in the decoding process. Figure 2 also shows the results of DEHGw/oPointer. From the results we can see that pointer generation is crucial for coping answer from cells and passages. It is due to HybridQA contains a large number of questions which answers are extracted from the tabular and textual data.

Related Work
Most work on QA uses structured and structured data independently (Talmor and Berant, 2018;Sun et al., 2018a;Kwiatkowski et al., 2019;Sun et al., 2019;Xiong et al., 2019;Chen et al., 2020a;Zhang et al., 2020;Pan et al., 2021;Eisenschlos et al., 2021;Yu et al., 2021). They use unstructured text system (TextQA) and structured knowledge base system (KBQA) to utilize different information. These methods cannot integrate different sources of information. A new method is proposed to aggregate heterogeneous information to find answer (Chen et al., 2020b;Feng et al., 2021). However, it only conducts multi-hop reasoning on table data. It is difficult to handle questions when multi-hop reasoning on both sources is required.

Conclusion
We have proposed a new approach to multi-hop question answering over tabular and textual data. The approach, referred to as DEHG, takes question answering as a problem of reasoning answers on the basis of a heterogeneous information graph. DEHG employs BERT in encoding of questions and passages respectively and generates pointers in decoding of answer generation. Experimental results show that DEHG significantly outperforms the state-of-the-art methods.