GLGR: Question-aware Global-to-Local Graph Reasoning for Multi-party Dialogue Reading Comprehension

,


Introduction
MDRC is a special Machine Reading Comprehension (MRC) task.It involves answering questions conditioned on the utterances of multiple interlocutors (Yang and Choi, 2019;Li et al., 2020).MDRC presents unique challenges due to two aspects: • MDRC relies heavily on multi-hop reasoning, where the necessary clues for reasoning are discretely distributed across utterances.
Baseline prediction: Only if you do an upgrade.
Baseline prediction: Can't I just copy over the os and leave the data files untouched.(Li et al., 2020).(b) Two question-answering pairs of the dialogue.The correct answer to each question is highlighted in the dialogue with the same color as the question.(c) The QIUG of the dialogue, where edges between utterances indicate the discourse relationship between them (replying and replied-by).(d) The LSRG subgraph of U4 and U5 in the dialogue.
• Multi-hop reasoning suffers from discontinuous utterances and disordered conversations (see Figure 1-a,b).
Recently, a variety of graph-based multi-hop reasoning (abbr., graph reasoning) approaches have been proposed to tackle MDRC (Li et al., 2021;Ma et al., 2021Ma et al., , 2022)).Graph reasoning is generally effective for bridging across the clues hidden in the discontinuous utterances, with less interference of redundant and distracting information occurring in the disordered conversations.The effectiveness is attributed primarily to the perception of global interactive relations of interlocutor-utterance graphs.
However, existing approaches encounter two bottlenecks.First, the question-disregarded graph construction methods (Li et al., 2021;Ma et al., 2021) fail to model the bi-directional interactions between question and utterances.As a result, it is prone to involving question-unrelated information during reasoning.Second, the inner token-level semantic relations in every utterance are omitted, making it difficult to perceive the exact and unabridged clues occurring in the local contexts.
To address the issues, we propose a Global-to-Local Graph Reasoning approach (GLGR) with Pre-trained Language Models (PLMs) (Kenton and Toutanova, 2019;Clark et al., 2020) as backbones.It encodes two heterogeneous graphs, including Question-aware Interlocutor-Utterance Graph (QIUG) and Local Semantic Role Graph (LSRG).QIUG connects the question with all utterances in the canonical Interlocutor-Utterance graph (Figure 1-c).It depicts the global interactive relations.By contrast, LSRG interconnects fine-grained nodes (tokens, phrases and entities) in an utterance in terms of their semantic roles, where semantic role labeling (Shi and Lin, 2019) is used.It signals the local semantic relations.To enable connectivity between LSRGs of different utterances, we employ coreference resolution (Lee et al., 2018) and synonym identification to identify shareable nodes (Figure 1-d).
Methodologically, we develop a two-stage encoder network for progressive graph reasoning.It is conducted by successively encoding QIUG and LSRG, where attention modeling is used.The attentive information squeezed from QIUG and LSRG is respectively used to emphasize the global and local clues for answer prediction.Accordingly, the representation of the input is updated step-by-step during the progressive reasoning process.Residual network is used for information integration.
We carry out GLGR within an extractive MRC framework, where a pointer network (Vinyals et al., 2015) is used to extract answers from utterances.The experiments on Molweni (Li et al., 2020) and FriendsQA (Yang and Choi, 2019) demonstrate three contributions of this study, including: • The separate use of QIUG and LSRG for graph reasoning yields substantial improvements, compared to PLM-based baselines.
The rest of the paper is organized as follows.Section 2 presents the details of GLGR.We discuss the experimental results in Section 3, and overview the related work in Section 4. We conclude the paper in Section 5.

Approach
The overall architecture of GLGR-based MDRC model is shown in Figure 2. First of all, we utilize PLM to initialize the representation of the question, interlocutors and utterances.On this basis, the firststage graph reasoning is conducted over QIUG, where Graph Attention network (GAT) (Veličković et al., 2018) is used for encoding.Subsequently, we carry out the second-stage graph reasoning over LSRG, where graph transformer layers (Zhu et al., 2019) are used for encoding.Finally, we concate-nate the initial hidden states and their updated versions obtained by GLGR, and feed them into the pointer network for answer prediction.
In the following subsections, we detail computational details after presenting the task definition.

Task Definition
Formally, the task is defined as follows.Given a multi-party dialogue D = {U 1 , U 2 , ..., U n } with n utterances and a question Q, MDRC aims to extract the answer A of Q from D. When the question is unanswerable, A is assigned a tag "Impossible".Note that each utterance U i comprises the name of an interlocutor S i who issues the utterance, as well as the concrete content of the utterance.

Preliminary Representation
We follow Li and Zhao (2021)'s study to encode the question Q and dialogue D using PLM, so as to obtain token-level hidden states H of all tokens in Q and D. Specifically, we concatenate the question Q and dialogue D to form the input sequence X, and feed X into PLM to compute H: where, [CLS] and [SEP] denote special tokens.
The hidden states H ∈ R l×d serve as the universal representation of X, where l is the maximum length of X, and d is the hidden size.The symbol θ denotes all the trainable parameters of PLM.In our experiments, we consider three different PLMs as backbones in total, including BERTbase-uncased (BERT base ), BERT-large-uncased (BERT large ) (Kenton and Toutanova, 2019) and ELECTRA (Clark et al., 2020).
To facilitate understanding, we clearly define different levels of hidden states as follows: • H refers to the hidden states of all tokens in X, i.e., the universal representation of X. h is the hidden state of a token x (x ∈ X).
• O is the hidden state of a specific node, known as node representation.Specifically, O q , O s and O u denote the representations of question node, interlocutor node and utterance node.

Global-to-Local Graph Reasoning
We carry out Global-to-Local Graph Reasoning (GLGR) to update the hidden states H of the input sequence X.GLGR is fulfilled by two-stage progressive encoding over two heterogeneous graphs, including QIUG and LSRG.

Global Graph Reasoning on QIUG
QIUG-QIUG is an expanded version of the canonical interlocutor-utterance graph (Li et al., 2021;Ma et al., 2021) due to the involvement of questionoriented relations, as shown in Figure 1-(c).Specifically, QIUG comprises one question node, N u utterance nodes and N s interlocutor nodes.We connect the nodes using the following scheme: • Each question node is linked to all utterance nodes.Bidirectional connection is used, in the directions of "querying" and "queried-by".
• Each interlocutor node is connected to all the utterance nodes she/he issued.Bidirectional connection is used, in the directions of "issuing" and "issued-by".
• Utterance nodes are connected to each other in terms of Conversational Discourse Structures (CDS) (Liu and Chen, 2021;Yu et al., 2022).CDS is publicly available in Molweni (Li et al., 2020), though undisclosed in FriendsQA (Yang and Choi, 2019).Therefore, we apply Liu and Chen ( 2021)'s open-source CDS parser to tread with FriendsQA.Bidirectional connection is used, i.e., in the directions of "replying" and "replied-by".
Consequently, QIUG contains six types of interactive relations T ={querying, queried-by, issuing, issued-by, replying, replied-by}.Each directed edge in QIUG solely signals one type of relation.
Node Representation-For an interlocutor node S i , we consider the tokens of her/his name and look up their hidden states in the universal representation H.We aggregate the hidden states by mean pooling (Gholamalinezhad and Khosravi, 2020).The resultant embedding O s i ∈ R 1×d is used as the node representation of S i .
For an utterance node U i , we aggregate the hidden states of all tokens in U i .Attention pooling (Santos et al., 2016) is used for aggregation.The resultant embedding O u i ∈ R 1×d is used as the node representation of U i .We obtain the representation O q of the question node Q in a similar way.
Multi-hop Reasoning-Multi-hop reasoning is used to discover and package co-attentive information across nodes and along edges.Methodologically, it updates the hidden states of all tokens in a node using the attentive information O in the neighboring nodes.Formally, the hidden state of each token is updated by: h=[h; O], i.e., concatenating h ∈ R 1×d and O ∈ R 1×d .
We utilize L 1 -layer GAT (Veličković et al., 2018) networks to compute O, where attention-weighted information fusion is used: where, E i comprises a set of neighboring nodes of the i-th node O i .W is a trainable scalar parameter, and the superscript (l) signals the computation at the l-the layer of GAT.Besides, i is used to update the hidden states of all tokens in the i-th node.
Divide-and-conquer Attention Modeling-Different interactive relations have distinctive impacts on attention modeling.For example, the "queriedby" relation (i.e., an edge directed from utterance to question) most likely portends the payment of more attention to the possible answer in the utterance.By contrast, the "replying" and "replied-by" relations (i.e., edges between utterance nodes) induce the payment of more attention to the complementary clues in the utterances.In order to distinguish between the impacts, we separately compute nodelevel attention scores α for different types of edges in QIUG.Given two nodes O i and O j with a t-type edge, the attention score α i,j is computed as: where [; ] is the concatenation operation, while f (•) denotes the LeakyRelu (Maas et al., 2013)  We feed H into a two-layer pointer network to predict the answer, determining the start and end positions of the answer in X.Note that the hidden state of [CLS] in H is used to predict the "Impossible" tag, i.e., a tag signaling unanswerable question.During training, we use the cross-entropy loss function to optimize the model.

Local Graph Reasoning on LSRG
Global graph reasoning is grounded on the global relations among question, interlocutor and utterance nodes, as well as their indecomposable node representations.It barely uses the inner token-level semantic relations in every utterance for multi-hop reasoning.However, such local semantic correlations actually contribute to the reasoning process, such as the predicate-time and predicate-negation relations, as well as coreferential relations.Therefore, we construct the semantic-role graph LSRG, and use it to strengthen local graph reasoning.
LSRG-LSRG is an undirected graph which comprises semantic-role subgraphs of all utterances in D. To obtain the subgraph of an utterance, we leverage Allennlp-SRL parser (Shi and Lin, 2019) to extract the predicate-argument structures in the utterance, and regard predicates and arguments as the fine-grained nodes.Each predicate node is connected to the associated argument nodes with undirected role-specific edges (e.g., "ARG1-V").Both the directly-associated arguments and indirectlyassociated are considered for constructing the subgraph, as shown in Figure 1-(d).
Given the semantic-role subgraphs of all utterances, we form LSRG using the following scheme: • We combine the subgraphs containing similar fine-grained nodes.It is fulfilled by connecting the similar nodes.A pair of nodes is determined to be similar if their inner tokens have an overlap rate more than 0.5.
• Interlocutor name is regarded as a special fine-grained node.We connect it to the finegrained nodes in the utterances she/he issued.
Fine-grained Node Representation-The finegrained nodes generally contain a varied number of tokens (e.g., "can not" and "the data files").To obtain identically-sized representations of them, we aggregate the hidden states of all tokens in each fine-grained node.Attention pooling (Santos et al., 2016) is used for aggregation.
In our experiments, there are two kinds of tokenlevel hidden states considered for fine-grained node representation and reasoning on LSRG, including the initial case h obtained by PLM, as well as the refined case h ( h ∈ H) by global graph reasoning.When h is used, we perform local graph reasoning independently, without the collaboration of global graph reasoning.It is carried out for MDRC in an ablation study.When h is used, we perform global-to-local graph reasoning.It is conducted in the comparison test.We concentrate on h in the following discussion.
Multi-hop Reasoning on LSRG-It updates hidden states of all tokens in each fine-grained node O i , where the attentive information Ǒ of its neighboring nodes in LSRG is used for updating.Formally, the hidden state of each token is updated by: ȟ=[ h; Ǒ].We use a L 2 -layer graph transformer (Zhu et al., 2019) to compute Ǒ ∈ R 1×d as follows: where, E i is a set of neighboring fine-grained nodes of the i-th node O i in LSRG.Wo ∈ R d×d and Wr ∈ R d×d are trainable parameters.In addition, r i,j ∈ R 1×d is the learnable embedding of the rolespecific edge between O i and its j-th neighboring node O j .Before training, the edges holding the same semantic-role relation are uniformly assigned a randomly-initialized embedding.Besides, β i,j is a scalar, denoting the attention score between O i and O j .It is computed as follows: where, o and Ẇ (l) r are trainable parameters of dimensions R d×d .M denotes the number of neighboring fine-grained nodes in E i .
Question-aware Reasoning-Obviously, the LSRG-based attentive information Ǒi is independent of the question.To fulfill question-aware reasoning, we impose the question-oriented attention upon Ǒi .Formally, it is updated by: Ǒi ⇐ γ i • Ǒi , where the attention score γ i is computed as follows: where, O q is the representation of the question node, and W q ∈ R d×d is a trainable parameter.We feed Ȟ into the two-layer pointer network for predicting answers.Cross-entropy loss is used to optimize the model during training.

Datasets and Evaluation
We experiment on two benchmark datasets, including Molweni (Li et al., 2020) and FriendsQA (Yang and Choi, 2019).Molweni is an MDRC dataset manufactured using Ubuntu Chat Corpus (Lowe et al., 2015).The dialogues in Molweni are accompanied with ground-truth CDS, as well as either answerable or unanswerable questions.Friend-sQA is another MDRC dataset whose dialogues are excerpted from the TV show Friends.It is characterized by colloquial conversations.CDS is undisclosed in FriendsQA.We use Liu and Chen (2021)'s CDS parser to pretreat the dataset.
We follow the common practice (Li et al., 2020;Yang and Choi, 2019) to split Molweni and Friend-sQA into the training, validation and test sets.The data statistics about the sets are shown in Table 1.We use F1-score and EM-score (Li and Zhao, 2021;Ma et al., 2021) as the evaluation metrics.
The numbers of network layers of GAT and graph transformer are set in the same way: L 1 =L 2 =2.• ULM-UOP (Li and Choi, 2020) fine-tunes BERT on a larger number of FriendsQA transcripts (known as Character Mining dataset (Yang and Choi, 2019)) before task-specific training for MDRC.Two self-supervised tasks are used for fine-tuning, including utteranceoriented masked language modeling and utterance order prediction.In addition, BERT is trained to predict both answers and sources (i.e., IDs of utterances containing answers).

Compared MDRC Models
• SKIDB (Li and Zhao, 2021) uses a multi-task learning strategy to enhance MDRC model.Self-supervised interlocutor prediction and key-utterance prediction tasks are used within the multi-task framework.
• ESA (Ma et al., 2021) uses GCN to encode the interlocutor graph and CDS-based utterance graph.In addition, it is equipped with a speaker masking module, which is able to highlight co-attentive information within utterances of the same interlocutor, as well as that among different interlocutors.

Main Results
Table 2 shows the test results of our GLGR models and the compared models, where different backbones (BERT base , BERT large and ELECTRA) are considered for the general comparison purpose.It can be observed that our GLGR model yields significant improvements, compared to the PLM baselines.The most significant performance gains are 4.3% F1-score and 4.3% EM-score, which are obtained on FriendsQA compared to the lite BERT base (110M parameters).When compared to the larger BERT large (340M parameters) and ELECTRA (335M parameters), GLGR is able to achieve the improvements of no less than 1.7% F1-score and 2.4% EM-score.In addition, GLGR outperforms most of the state-of-the-art MDRC models.The only exception is that GLGR obtains a comparable performance relative to ESA when BERT base is used as the backbone.By contrast, GLGR substantially outperforms ESA when BERT large and ELECTRA are used.
The test results reveal distinctive advantages of the state-of-the-art MDRC models.DADgraph is a vest-pocket model due to the involvement of a sole interlocutor-aware CDS-based graph.It offers the basic performance of graph reasoning for MDRC.ESA is grounded on multiple graphs, and it separately analyzes co-attentive information for subdivided groups of utterances.Multi-graph reasoning and coarse-to-fine attention perception allow ESA to be a competitive MDRC model.By contrast, ULM-UOP doesn't rely heavily on conversational structures.Instead, it leverages larger dataset and diverse tasks for fine-tuning, and thus enhances the ability of BERT in understanding domain-specific languages at the level of semantics.It can be observed that ULM-UOP achieves similar performance compared to ESA.SKIDB successfully leverages multi-task learning, and it applies interesting and effective self-supervised learning tasks.Similarly, it enhances PLMs in encoding domain-specific languages, which is not limited to BERT.It can be found that SKIDB obtains comparable performance on Molweni compared to ESA.
Our GLGR model combines the above advantages by external conversational structure analysis and internal semantic role analysis.On this basis, GLGR integrates global co-attentive information with local for graph reasoning.It can be observed that GLGR shows superior performance, although it is trained without using additional data.

Ablation Study
We conduct the ablation study from two aspects.First of all, we progressively ablate global and local graph reasoning, where QIUG and LSRG are omitted accordingly.Second, we respectively ablate different classes of edges from the two graphs, i.e., disabling the structural factors during multi-hop reasoning.We refer the former to "Graph ablation" while the latter the "Relation ablation".
Graph ablation-The negative effects of graph ablation is shown in Table 3.It can be easily found that similar performance degradation occurs when QIUG and LSRG are independently pruned.This implies that local reasoning on LSRG is effectively equivalent to global reasoning on QIUG, to some extent.When graph reasoning is thoroughly disabled (i.e., pruning both QIUG and LSRG), the performance degrades severely.
Relation ablation-The negative effects of relation ablation is shown in Table 4.For QIUG, we condense the graph structure by respectively disabling interlocutor, question and utterance nodes.It can be found that the performance degradation rates in the three ablation scenarios are similar.This demonstrates that all the conversational structural factors are crucial for multi-hop reasoning.For LSRG, we implement relation ablation by unbinding the co-referential fine-grained nodes, omitting semantic-role relations, and removing questionaware reasoning, respectively.The way to omit semantic-role relations is accomplished by full connection.It can be observed that ablating semanticrole relations causes relatively larger performance degradation rates.

The Impact of Utterance Number
We follow the common practice (Li and Zhao, 2021;Li et al., 2022) to verify the impacts of utterance numbers.The FriendsQA test set is used in the experiments.It is divided into four subsets, including the subsets of dialogues containing 1) no more than 9 utterances, 2) 10∼19 utterances, 3) 20∼29 utterances and 4) no less than 30 utterances.The GLGR is re-evaluated over the subsets.We illustrate the performance in Figure 3.
It can be found that the performance decreases for the dialogues containing a larger number of utterances, no matter whether the baseline or GLGR is used.In other words, both the models are distracted by plenty of noises in these cases.Nevertheless, we observe that GLGR is able to stem the tide of performance degradation to some extent.Therefore, we suggest that GLGR contributes to the anti-noise multi-hop reasoning, although it fails to solve the problem completely.

Reliability of Two-stage Reasoning
There are two alternative versions of GLGR: one changes the reason sequence of QIUG and LSRG (local-to-global graph reasoning).The other performs the single-phase reasoning over a holistic graph that interconnects QIUG and LSRG.In this  For the single-phase reasoning version, we combine QIUG with LSRG by two steps, including 1) connecting noun phrases in the question node to the similar fine-grained nodes in utterances, and 2) connecting utterance nodes to the entities occurring in them.On this basis, we encode the resultant holistic graph using GAT, which serves as the single-phase GLGR.It is equipped with ELECTRA and the pointer network.
The comparison results (single-phase GLGR versus two two-stage GLGRs) are shown in Table 5, where ELECTRA is used as the backbone.It is evident that the single-stage GLGR obtains inferior performance.It is most probably because that the perception of co-attentive information among fine-grained nodes in LSRG suffers from the interference of irrelevant coarse-grained nodes in QIUG.This drawback raises a problem of combining heterogeneous graphs for multi-hop reasoning.Besides, we observe that the global-to-local reasoning method exhibits better performance compared to local-to-global graph reasoning.We attribute it to the initial local graph reasoning in the local-toglobal reasoning, which ineffectively integrates the distant and important context information while focusing on local semantic information.This leads to suboptimal multi-hop reasoning and highlights the importance of the graph reasoning order in handling complex information dependencies.

Case Study
GLGR is characterized as the exploration of both global and local clues for reasoning.It is implemented by highlighting co-attentive information in coarse-grained and fine-grained nodes.
Figure 4 shows a case study for GLGR-based MDRC, where the heat maps exhibit the attention distribution on both utterance nodes and tokenlevel nodes.There are three noteworthy phenomena occurring in the heat maps.First, GLGR assigns higher attention scores to two utterance nodes, which contain the answer and closely-related clues, respectively.Second, both the answer and clues are assigned higher attention scores, compared to other token-level nodes.Finally, the answer and clues emerge from different utterance nodes.This is not an isolated case, and the phenomena stand for the crucial impacts of GLGR on MDRC.Another research branch focuses on the study of language understanding for dialogues, where self-supervised learning is used for the general-tospecific modeling and transfer of the pretrained models.Li and Choi (2020) transfer BERT to the task-specific data, where two self-supervised tasks are used for fine-tuning, including Utterance-level Masked Language Modeling (ULM) and Utterance-Order Prediction (UOP).During the transfer, the larger-sized dataset of FriendsQA transcripts is used.Similarly, Li and Zhao (2021) transfer PLMs to dialogues using a multi-task learning framework, including the tasks of interlocutor prediction and key utterance prediction.

Semantic Role Labeling
In this study, we use Semantic Role Labeling (SRL) to build LSRG for local graph reasoning.To facilitate the understanding of SRL, we present the related work as follows.
SRL is a shallow semantic parsing task that aims to recognize the predicate-argument structure of each sentence.Recently, advances in SRL have been largely driven by the development of neural networks, especially the Pre-trained Language Models (PLMs) such as BERT (Kenton and Toutanova, 2019).Shi and Lin (2019) propose a BERT-based model that incorporates syntactic information for SRL.Larionov et al. (2019) design the first pipeline model for SRL of Russian texts.
It has been proven that SRL is beneficial for MRC by providing rich semantic information for answer understanding and matching.Zheng and Kordjamshidi (2020) introduce an SRL-based graph reasoning network to the task of multi-hop question answering.They demonstrate that the finegrained semantics of an SRL graph contribute to the discovery of an interpretable reasoning path for answer prediction.

Conclusion
We propose a global-to-local graph reasoning approach towards MDRC.It explores attentive clues for reasoning in both coarse-grained graph QIUG and fine-grained graph LSRG.Experimental results show that the proposed approach outperforms the PLMs baselines and state-of-the-art models.The codes are available at https://github.com/YanLingLi-AI/GLGR.
The main contribution of this study is to jointly use global conversational structures and local semantic structures during encoding.However, it can be only implemented by two-stage reasoning due to the bottleneck of in-coordinate interaction between heterogeneous graphs.To overcome the issue, we will use the pretrained Heterogeneous Graph Transformers (HGT) for encoding in the future.Besides, the graph-structure based pretraining tasks will be designed for task-specific transfer learning.

Limitations
While GLGR demonstrates several strengths, it also has limitations that should be considered.First, GLGR relies on annotated conversation structures, co-reference, and SRL information.This dependency necessitates a complex data preprocessing process and makes the model susceptible to the quality and accuracy of the annotations.Therefore, it is important to ensure the accuracy and robustness of the annotations used in the model training and evaluation process.Second, GLGR may encounter challenges in handling longer dialogue contexts.The performance may exhibit instability when confronted with extended and more intricate conversations.Addressing this limitation requires further investigation of the stability and consistency in a real application scenario.

Figure 2 :
Figure 2: The main architecture of the two-stage encoder for Global-to-Local Graph Reasoning approach (GLGR).

Following
Ma et al. (2021)'s study, we consider the standard span-based MRC model (Kenton and Toutanova, 2019) as the baseline.We compare with a variety of state-of-the-art MDRC models:• DADgraph(Li et al., 2021)  constructs a CDSbased dialogue graph.It enables the graph reasoning over conversational dependency features and interlocutor nodes.Graph Convolutional Network (GCN) is used for reasoning.

Figure 3 :
Figure 3: Performance of GLGR (ELECTRA) and baseline on the FriendsQA with different utterance numbers.
Comparing single-phase GLGR and two two-stage GLGRs, here "rev."indicates the local-to-global reasoning version.Qkslvrwolf: How do I get FILEPATH manager to use a different icon for all say, folders?_jason: Probably need to edit the icon theme.C-O-L-T: Help me in installing tovid cause I can not.C-O-L-T: I am following the directions but I still can not.... C-O-L-T: Should I try the other program?_jason: It's a dependency anyway so sure.C-O-L-T: I can not get mplayer in synaptic.: How does Qkslvrwolf use the different icon ?Baseline: unanswerable.GLGR: edit the icon theme.

Figure 4 :
Figure 4: A case study for the ELECTRA-based GLGR subsection, we intend to compare them to the global-to-local two-stage GLGR.For the single-phase reasoning version, we combine QIUG with LSRG by two steps, including 1) connecting noun phrases in the question node to the similar fine-grained nodes in utterances, and 2) connecting utterance nodes to the entities occurring in them.On this basis, we encode the resultant holistic graph using GAT, which serves as the single-phase GLGR.It is equipped with ELECTRA and the pointer network.The comparison results (single-phase GLGR versus two two-stage GLGRs) are shown in Table5, where ELECTRA is used as the backbone.It is evident that the single-stage GLGR obtains inferior performance.It is most probably because that the perception of co-attentive information among fine-grained nodes in LSRG suffers from the interference of irrelevant coarse-grained nodes in QIUG.This drawback raises a problem of combining heterogeneous graphs for multi-hop reasoning.Besides, we observe that the global-to-local reasoning method exhibits better performance compared to local-to-global graph reasoning.We attribute it to the initial local graph reasoning in the local-toglobal reasoning, which ineffectively integrates the distant and important context information while focusing on local semantic information.This leads to suboptimal multi-hop reasoning and highlights the importance of the graph reasoning order in handling complex information dependencies.
-party Dialogue Reading Comprehension A variety of graph-based approaches have been studied for MDRC.They successfully incorporate conversational structural features into the dialogue modeling process.Ma et al. (2021) construct a provenance-aware graph to enhance the co-attention encoding of discontinuous utterances of the same interlocutor.Li et al. (2021) and Ma et al. (2022) apply CDS to bring the mutual dependency features of utterances into the graph reasoning process, where GCN (Kipf and Welling, 2017) is used for encoding.Recently, Li et al. (2022) propose a back-and-forth comprehension strategy.It decouples the past and future dialogues, and models interactive relations in terms of conversational temporality.Li et al. (2023) add the coreference-aware attention modeling in PLM to strengthen the multi-hop reasoning ability.
activation function.E i,t is the set of all neighboring nodes that are connected to O i with a t-type edge.
all ], where ǑL2 all is computed using h instead of h.

Table 2 :
Results on the test sets of Molweni and FriendsQA.