Task-Aware Self-Supervised Framework for Dialogue Discourse Parsing

,


Introduction
Dialogue discourse parsing (DDP) plays an essential role in the field of natural language processing (NLP), serving as a foundational task and receiving increasing attention from the research community (Shi and Huang, 2019;Yang et al., 2021;Yu et al., 2022).The formulation of the DDP task is rooted in the Segmented Discourse Relation Theory (Asher and Lascarides, 2003), distinguishing it from the Rhetorical Structure Theory (Mann and Thompson, 1988) and the Penn Discourse Relation Theory (Prasad et al., 2008) that primarily underpins text-level discourse parsing (Afantenos et al., 2015).The primary objective of DDP is to recognize the links and relations between utterances in dialogues.Here A and B are two speakers with subscripts indicating the turn of the dialogue."QAP", "CQ", "EXP" and "Comment" stand for dependency relations.
The parsed dependencies between utterances should form a Directed Acyclic Graph (DAG).For example, dependent utterance B 3 in Fig. 1 depends on two head utterances B 1 and A 2 in "Explanation" and "QAP" relations, respectively.The early neural-based paradigm (Shi and Huang, 2019) for the DDP task involved sequentially scanning the utterances in dialogues and subsequently predicting dependency links and corresponding relation types, which is prone to the severe error propagation issue (Wang et al., 2021), as the learned representations relied on historical predictions.This observation inspired us to formalize DDP from a graph perspective, where the prediction of links and relations for each pair of utterances is independent of the others.
Existing methods (Wang et al., 2021) resorted to the bottom-up strategy for DDP, where for each dependent utterance, the parser retrieved only one head utterance.These models (He et al., 2021) may suffer from measurement bias (Mehrabi et al., 2021) as the training labels are distorted with the one-head retrieval for samples with multi-head labels.Furthermore, these multi-head labels serve as long-tailed data, and overlooking the long-tailed data may result in a biased distribution estimation during training (Wang et al., 2022).
1 It inspires us to keep the multi-head labels for bottom-up strategy training.Moreover, Yang et al. (2021) and Yu et al. (2022) used external knowledge or other NLP tasks to mitigate algorithmic bias (Mehrabi et al., 2021) during training.Nevertheless, they omitted the potential of leveraging internal structure for regularization.While the topdown strategy can be effective in sentence parsing (Koto et al., 2021), it was rarely exploited in DDP due to the requirement of predicting multiple dependents for each head in dialogues.Despite the opposite directions of the two strategies, there exists a strong correlation between the relevance scores of the same utterance pair in both strategies.It leaves room for reciprocal structural supervision between bottom-up and top-down strategies.
Meanwhile, as a foundational NLP task, DDP has proved beneficial for downstream tasks including dialogue summarization (Chen and Yang, 2021) and emotion recognition in conversations (ERC) (Li et al., 2022(Li et al., , 2023)).Nevertheless, existing parsers are constrained by predefined relation types, posing a potential obstacle to the parser's adaptability for downstream tasks.For example, existing relations like narration and background do not/ or rarely exist in the ERC datasets.In addition, a well-designed DDP fine-tuned with task-aware dependency labels can capture emotion shifts, which benefits the downstream ERC task.
Overall, the previous methods have three limitations, i.e., error propagation, learning bias of distorted training labels and a single strategy, and incompatibility of predefined relations with downstream tasks.To this end, we propose a task-aware self-supervised framework for DDP task.Concretely, a graph-based model DialogDP, utilizing biaffine mechanism (Dozat and Manning, 2017), is designed for DDP, avoiding sequential error propagation.The model consists of symmetric parsers, simultaneously performing bottom-up and top-down parsing.
We investigate parsed links and relation graphs of the two strategies and design a bidirectional selfsupervision mechanism encouraging the two strategies to acquire similar graphs through mutual learning.Moreover, we propose a soft-window triangular (SWT) mask incorporating a soft constraint, as opposed to a hard constraint (Shi and Huang, 2019), to guide the parsers.SWT mask encourages parsers to prioritize candidate links within a flexible window for each utterance.
To enhance the adaptability of DialogDP for downstream tasks, we propose a novel paradigm involving fine-tuning with task-specific re-annotated relations.We validate the effectiveness of our taskaware paradigm on downstream ERC.The contributions of this paper are: • We propose a new DDP model that explicitly captures structures of dependency graphs with bottom-up and top-down strategies, avoiding sequential error propagation.
• Bidirectional self-supervision with an SWT mask is devised to alleviate the learning bias.
• Our parser surpasses baselines on benchmark datasets and task-aware DialogDP demonstrates superior effectiveness in handling downstream tasks.

Related Work
DDP aims to analyze a conversation between two or more speakers to recognize the dependency structure of a dialogue.Compared with the general text-level discourse parsing (Mann and Thompson, 1988;Prasad et al., 2008;Li et al., 2014b;Afantenos et al., 2015), DDP provides significant improvement for many dialogue-related downstream tasks (Ma et al., 2023;Zhang and Zhao, 2021;Chen and Yang, 2021) via introducing symbolic dialogue structure information into the modeling process.
Existing works mainly focused on applying neural models to handle problems in DDP.In detail, Shi and Huang (2019) formalized DDP as a dependency-based rather than a constituencybased (Li et al., 2014a) parsing.However, their sequential scan method introduced error propagation.Wang et al. (2021) proposed a structure self-aware model, producing representations independently of historical predictions to handle error propagation, yet still encounters learning bias.Recent methods used external knowledge or other NLP tasks to mitigate bias during training.Liu and Chen (2021) utilized domain adaptation techniques to produce enhanced training data and thus improved the DDP model's performance.Yu et al. (2022) pointed out the lack of modeling speaker interactions in previous works and proposed a joint learning model.Fan et al. (2022) presented a distance-aware multitask framework combining both transition-and graphbased paradigms.Nevertheless, they overlooked the potential of leveraging the internal structure for regularization.

14163
In summary, despite the good performance, the existing methods still have problems in modeling and application.Modeling-wise, the DDP is currently limited to either a top-down or bottom-up manner, leading to a gap in achieving bidirectionality.Application-wise, the issue arises due to the constraints imposed by predefined relation types, thereby limiting the benefits to only tasks directly associated with these pre-defined relation types.Consequently, there is a need to establish connections between the parsed dependency relations and downstream tasks.In this paper, we propose a new framework to address these important issues.

Methodology
DialogDP is tailored for integral graph-driven discourse parsing, which is more computationefficient and avoids error propagation in the training process.Specifically, the bidirectional selfsupervision allows the parser to parse the dialogue with both top-down and bottom-up strategies.The employed strategies in both directions can mutually reinforce and guide each other.Moreover, the SWT mask guides the two strategies and imposes soft constraints on the integral learned graph, prompting the parser to prioritize the candidate links of each utterance within a flexible window.

Task Definition
Given a multi-turn multi-party (or dyadic) dialogue U consists of a sequence of utterances {u 1 , u 2 , ..., u n }, the goal of the DDP task is to identify links and the corresponding dependency types {(u j , u i , r ji )|j ̸ = i} between utterances, where (u j , u i , r ji ) represents a dependency with type r ji from dependent utterance u j to head utterance u i .We formulate the DDP as a graph spanning process, where the dependency link of the current utterance u i is predicted by calculating a probability distribution P (u j |u i , j ̸ = i) over other utterances.Dependency relation type prediction is formulated as a multi-class classification task, where the probability distribution is computed as P (t|u i , u j , t ∈ C). |C| is the number of pre-defined relation types.
The parsed dependencies between utterances constitute a DAG (Shi and Huang, 2019).However, due to the limited presence of multiple incoming relations in STAC and Molweni datasets, most existing methods parse a dependency tree (Liu and Chen, 2021) which is a special type of DAG.The dependency types are predefined as 16 rela-tions (Appendix A), specified by (Asher et al., 2016).Following (Li et al., 2014b), we add a root node, denoted as u 0 .An utterance is linked to u 0 , if not connected to preceding utterances.

Model Overview
To tackle error propagation in sequential scan, we re-formalized the DDP in a graph-based manner, where a link graph and a dependency-type graph are built based on the scores computed by the biaffine (Dozat and Manning, 2017;Zhang et al., 2020) mechanism.The parsed dependency tree can be obtained by jointly decoding the two graphs.Fig. 2 illustrates the structure of our proposed DialogDP model.First, the pre-trained large language model BERT (Devlin et al., 2018) is employed to generate speaker-aware and contextaware utterance-level representations.Second, a bidirectional self-supervision mechanism is designed to capture the links and relations between utterances, and an SWT mask is applied to regularizing the learned graphs in an explainable manner.

Speaker-Context Encoder
we concatenate the whole dialogue in a single sequence where [cls] and [sp i ] are special tokens of start of the sequence and speaker Sp i , respectively.Then speaker-context integrated embeddings e can be obtained through P LM (x).e i := P LM (x) j is the speaker-context integrated embedding of utterance u i of speaker Sp i , where j is obtained when x j = [sp i ].

Bidirectional Self-Supervision
DialogDP is composed of two symmetric components, a bottom-up parser and a top-down parser.The two parsers are designed on the basis of the biaffine mechanism (Dozat and Manning, 2017) which is proved to be effective on sentence-level dependency parsing.The bottom-up strategy involves the parser calculating the biaffine attention score between a dependent and a head, and selecting the head with the highest score for each dependent.In contrast, the top-down parser identifies the dependents with high scores for each head.Subsequently, both parsers build a link graph and a relation graph, wherein each node represents an utterance, and the arcs connect pairs of nodes within the graph.

14164
Context-Speaker Encoder Here, H p ∈ R d×n and r p y i ∈ R d×1 are the head hidden states from two different multiple layer perceptrons (MLPs), while h d′ i ∈ R 1×d and r d i ′ ∈ R 1×d are the dependent hidden states from another two different MLPs.s a i ∈ R 1×n is the attention scores between the i-th dependent and n heads.
is the likelihoods of each class given the i-th dependent and n heads.Similarly, the arc scores (S a td ) and relation label scores (S l td ) with top-down strategy can be obtained by exchanging the positions of head hidden states and dependent hidden states in the above formulas.w (2) and W (1) l ∈ R d×|C|×d are learnable parameters of the biaffine mechanism.
By observing the arc (relation) scores of both the top-down strategy and the bottom-up strategy, we can readily infer their high relevance.In other words, S a bu ∝ S a td ′ and S l bu ∝ S l td ′ .Hence, we designed a bidirectional self-supervision mechanism for regularizing the bidirectional strategies by assuming S a bu = S a td ′ and S l bu = S l td ′ .
The symmetric self-supervision losses are (4) Here KL(P ||Q) = n×n P log( P Q ) is the Kullback-Leibler divergence (Kullback and Leibler, 1951) between the two distributions.P and Q refer to the aforementioned paired attention scores.

Soft-Window Triangular Mask
In this work, we assume that all the head utterances appear before the dependent utterances.This aligns with our intuition that the current utterance cannot be induced by the preceding utterance.As we can see in Fig. 2, the feasible attention scores, for bottom-up strategy, between each head and the candidate dependents distributed in the lower triangular matrix M l = [m l ij ] such that m l ij = 0 for i ≤ j, as bottom-up strategy is to retrieve one head for each dependent.Similarly, top-down strategy corresponds to the upper triangular matrix M u = [m u ij ] such that m u ij = 0 for i ≥ j, as it identifies the dependents for each head.
The previous method (Shi and Huang, 2019) implemented a hard window constraint, compelling the model to exclusively select links within a predetermined distance range.However, the hard window excluded all candidates whose distance from the current utterance exceeds the predefined window size.

14165
This limitation hampers the generalization capability in real-world scenarios.To this end, we devise a soft-window mechanism that can be implemented by a carefully designed mask.Each element m ω i,j of the mask M ω is calculated by: where f (•, min, max) is the Max-Min scaling that scales each input feature to a given range [min, max].d i,j denotes |i − j|.B 0 , B 1 , and B 2 are the distance boundaries suggested by the prior distribution of head-dependent distances, of which the distribution plot is in Appendix A. α 0 = 0.95, α 1 = 0.85 and α 2 = 0.7 are the hyper-parameters.
Combining the triangular mask with the softwindow mask, SWT masks for bottom-up and topdown strategies, M bu and M td are obtained by (5) The scores of arcs and relations after the softwindow triangular masking are O a ∈ R n×n and O l ∈ R n×n×|C| , respectively.We denote o a i,j := [O a ] i,j and o l i,j,k := [O l ] i,j,k .

Link and Relation Decoding
In the decoding process, the parser leverages the bottom-up strategy to predict the link and the corresponding relations.Concretely, for each dependent utterance u i , the parser predicts its head u j and then recognizes the relation of the predicted link.
ŷi,j denotes whether the parser predicts a link between u i and u j .k < i indicates that the predicted head should be the preceding utterance before the current dependent utterance, which is achieved by applying the triangular mask.Then, the j-th utterance is determined as the head through Similarly, the parser predicts the relation between u i and its selected head u j .The probability distribution over all relations is calculated as follows: . (9)

Loss Function
We use binary cross entropy (BCE) loss for multilabel classification as link prediction loss for both strategies, and adopt cross entropy (CE) loss for relation classification.Here we show the classification loss functions of the top-down strategy, as the classification losses of both strategies are similar.
For u i and u j , y i,j is a binary label for link and y l i,j is a relation label.
Here n l is the number of existing links in a dialogue.While the bottomup strategy in decoding selects just one head for each dependent, it is trained using a multi-label classification approach.This is essential because a small set of dependents may have multiple groundtruth heads.Our method capitalizes on this to fully exploit the instructive dependent-head label information, which is often ignored by other methods.In summary, the total loss of the framework is the weighted summation of classification and supervision losses.L = L a bu + L l bu + L a td + L l td + λ a L a s + λ l L l s (12) Here λ a and λ l are trade-off parameters.

Task-Aware Dialogue Discourse Parsing
Previous researches on DDP followed the original definition of 16 relations, regardless of application scenarios.However, the existing relation taxonomy may not fit the downstream tasks, e.g., ERC.Hence, we propose a task-aware DDP paradigm to improve its adaptability to downstream tasks.In Fig. 3 (e) Predict dependency links and relations with DialogDP for test dataset and incorporate results into input features of downstream tasks.
Algorithm 1: Task-aware annotation Data: downstream dialogue U d with labels L ∈ N + ∪ {0}, predicted links Y Result: task-aware relations T d 1 define state transition mapping g(li, lj), (li, lj ∈ L); 2 while yi,j in Y and yi,j ̸ = 0 do 3 t d i,j ← g(li, lj); 4 end Taking the downstream ERC task as an example, task-aware DDP leverages the state transition to represent emotion shift which can enhance ERC performance (Gao et al., 2022) and contribute to the research in the psychology domain (Winkler et al., 2023).The state transition mapping g (eq.13) from head u d i to u d j is obtained based on their target labels l i and l j , where l i , l j ∈ {0, 1, 2} corresponding to {negative, neutral, positive} sentiment polarities.

Dataset
We evaluated DialogDP on two publicly available datasets, i.e., STAC2 (Asher et al., 2016) and Molweni3 (Li et al., 2020).STAC is collected from an online game, The Settlers of Catan.It consists of 1, 173 annotated dialogues, divided into two sets, i.e., 1, 062 dialogues for training and 111 for testing.Molweni is collected from Ubuntu Chat Corpus (Lowe et al., 2015).The dataset consists of 9, 000 annotated training instances, along with 500 instances allocated for development and another 500 instances for testing.We pre-processed datasets following (Shi and Huang, 2019).

Setups and Metrics
We used BERT-large from HuggingFace4 as the speaker-context encoder.The optimizer is AdamW (Loshchilov and Hutter, 2018) with initial learning rates of 1e-5 and 5e-6 for DialogDP and BERT encoder, respectively.The maximum value of gradient clipping is 10.Both λ a and λ l are configured with 0.05.The dropout rate is configured to 0.33, adhering to the default setting of biaffine.Following Shi and Huang (2019), we report micro F1 scores for LINK prediction and L&R prediction, respectively.In L&R, both the link and its corresponding relation should be correctly classified.
The experiments were performed on a V100 GPU with 16 GB of memory.To provide accurate results, we conducted three random runs on test sets and reported the average score.

Main Results
We report the DDP results of our DialogDP and baselines on STAC and Molweni datasets.In Table 1, DialogDP outperforms all the baselines on L&R and achieves comparable LINK prediction results.We believe that the weak L&R of baselines can be attributed to two factors, i.e., link predictor and relation classifier.The F1 scores on Molweni demonstrate that the majority of the baselines exhibit weaknesses in both link prediction and relation classification.The results on STAC indicate that some baselines have a relatively stronger link predictor yet obtain a poor performance on L&R.This is because their weaker relation classifiers may fail to identify the relations of predicted links, even if the links are determined accurately.Shi and Huang (2019) set a fixed window in their model, which reduced the complexity of link prediction in DDP yet compromised the model's capacity to capture relations in long-range dependencies.

Ablation Study
We conducted ablation studies to further explore the functions of essential components in DialogDP.mechanism and soft-window mask.Specifically, if L&R_Sup is removed from DialogDP, the F1 scores on the four indices drop 1.1% on average.Without SWM, the performance decreases 1.15% on average.The results of the Biaffine model highlight the significance of incorporating both bidirectional self-supervision and SWM mechanisms.
The average F1 score drops of w/o Link_Sup on both LINK(0.55%)and L&R(1.4%)reveals that link supervision benefits both link prediction and overall dependency parsing.This observation is in line with our intuition, as the accuracy of subsequent relation classification depends on that of the predicted links.Furthermore, we observe that the inclusion of relation supervision not only improves link prediction but also suggests a reciprocal influence between link prediction and relation classification.The results of Sup BU and Sup TD prove DialogDP is regularized by the bidirectional selfsupervision, which reduces the training bias.

TA-DialogDP on ERC task
We conducted experiments on an ERC dataset MELD (Poria et al., 2019) to investigate the proposed task-aware paradigm.Specifically, we selected SKIER (Li et al., 2023) as the backbone to verify the effectiveness of the generated dialogue dependency graph by the task-aware mechanism, as it explicitly leveraged parsed trees for ERC.In Table 3, the DialogDP-based models significantly outperform SKIER W/O TL and DialogDP models in task-aware setup even achieve comparable results with SKIER, especially on 3-class sentiment analysis.
7 14168 Our observations indicate, in general, the taskaware setup provides a slight benefit to DialogDP, except for TA-DialogDP (STAC) in 7-class ERC task.We believe this may be attributed to the fact that the task-aware relation annotation specifically focuses on polarity shifting without differentiating emotion shifting.We also employed ChatGPT C for ERC.Overall, the performance of ChatGPT lags significantly behind that of SKIER, primarily due to the zero-shot setting.However, the inclusion of task-aware prompts helps to align the ChatGPT with downstream ERC through task-specific relations, whereas prompts integrated with DDP may present a challenge for ChatGPT in comprehending fundamental relation types.Table 3: Results on the downstream ERC task.The metric used is the weighted average F1 score.The bold text reveals the best performance, and the underlined indicates the second best.Subscripts in between brackets denote the corpus used to train the parser.* means the model was fine-tuned on manually annotated relations.Here, the results of SKIER * serve as a reference point, not for direct comparison.w/o TL means without manually annotated relations.

Case Study
As shown in Fig 4, our model surpasses Chat-GPT in L&R.In detail, A(0)->B(2), A(0)->D(4), D(4)->D(6) do not appear in ground truth but are indicated by ChatGPT.Moreover, ChatGPT fails on all QAP (label 1) pairs.The attention scores of DialogDP exhibit a notable distinction between correct links (dark) and incorrect links (light).Compared with DeepSequential, our graph-based DialogDP avoids sequential error propagation through a parallel attention mechanism.

Effect of Dialogue Length
We examined the effect of dialogue length on the parsing performance.As shown in Fig. 5 (a), the trend on STAC is a gradual decline in performance as the dialogue length increases.The reason is that long-range dialogues encompass a greater number of links and relation types, resulting in a more complicated parsing tree.Consequently, the performance of the parser is adversely affected, resulting in lower F1 scores.Additionally, the performance of L&R declines more rapidly compared to that of Link prediction.This suggests that the existence of long-term dependencies also presents significant challenges for the relation classifier.In Fig. 5 (b), we did not observe a comparable trend, as the length of Molweni dialogues demonstrates a concentrated distribution within a limited range of [7,14].This may benefit the parser.

Effect of Dialogue Turn
We further investigated the performance of baselines and our DialogDP at different dialogue turns on the Molweni dataset.In Fig. 6 a downward trend in all models.As the dialogue turn increases, the search space of bottom-up-based methods expands, leading to a decline in the accuracy of L&R prediction.DeepSequential displays the consistently lowest accuracy across all dialogue turns, largely attributed to the issue of error propagation.Meanwhile, the other two models eliminated this issue as they avoided using historical predictions for processing the current one.Different from the model of Wang et al. (2021) in Fig. 6, Di-alogDP shows a gentle-slope downward trend.Our method exceeds the model of Wang et al. (2021) in all the dialogue turns except the 9th turn.Additionally, DialogDP has two local minimum points at the 5th turn (48.0%) and 9th turn (43.1%), both of which performs better than those at the 5th (40.0%) and 8th (41.5%) turns.Given that the model proposed by Wang et al. (2021) employed graph-based techniques (Cai and Lam, 2020) to mitigate error propagation, our model demonstrates superior performance in addressing this issue since the bidirectional self-supervision improves the stability of prediction results by jointly utilizing the top-down and bottom-up dependency parsing graphs.

Conclusion
In this paper, we proposed a bidiretional selfsupervised dialogue discourse parser DialogDP and a task-aware paradigm combining DDP and downstream task ERC.First, we designed two graphbased parsers leveraging both bottom-up and topdown parsing strategies and eliminating sequential error propagation.Then a bidirectional selfsupervision mechanism is designed to reduce the learning bias by exploiting the structural symmetry of two strategies, and thus avoid the reliance on external knowledge.Furthermore, a soft-window triangular mask, tailored with statistical information, is utilized to effectively handle long-term dependencies.Second, we presented a task-aware paradigm bridging the gap between the foundational DDP with downstream tasks through finetuning DialogDP with task-aware dependency relation annotations.Empirical studies on DDP and downstream ERC show the superiority and adaptability of our DialogDP and task-aware paradigm.

Limitations
Due to the lack of theoretical support, it is challenging to design task-aware dependency relations.The design of task-specific dependency relations is expected to exert a substantial influence on the performance of downstream tasks that are integrated with DDP.Hence, it is encouraged to undertake theoretical analysis in order to devise task-specific relations that are better suited for the intended downstream tasks.This paper primarily concentrates on the ERC task to serve as an illustrative example for validating the effectiveness of the taskaware paradigm.However, additional comprehensive evaluations would be beneficial for thoroughly assessing the proposed task-aware paradigm.9 14170

B The application of ChatGPT to DDP task
This section details the usage of ChatGPT on the DDP task.To prompt ChatGPT to parse a dialogue based on specific requirements, we begin by providing an example dialogue that includes pre-defined dependency relations and a parsed dependency tree.Subsequently, we input a dialogue where we ask ChatGPT to parse it accordingly.Below is the prompting example: Given the dialogue history, please predict the discourse parsing based their semantic relevance and logic flow as follows: speaker: C The application of ChatGPT to ERC task with dependencies from task-aware DialogDP Input of System content: "You are an expert in sentiment analysis.The given head utterances may influence the emotion of the current utterance.Please identify the emotion label for the current utterance with one of the pre-defined emotion labels.The emotion labels are [neutral, surprise, fear, sadness, joy, disgust, anger]."Task-aware dependency-based prompt: prompt = "Dialogue History: U , Relation: r, Head: u j , Current utterance: u i , Emotion label for current utterance is:".12 14173

Figure 1 :
Figure 1: An example of a dialogue dependency graph for a dailydialog (Li et al., 2017) conversation snippet.Here A and B are two speakers with subscripts indicating the turn of the dialogue."QAP", "CQ", "EXP" and "Comment" stand for dependency relations.

Figure 3 :
Figure 3: Framework of task-aware DDP.D/S denotes downstream.The means not using the predefined relation labels.The backward arrow from D/S Task to Task-aware DialogDP corresponds to step (c) in subsection 3.8.
Yang et al. (2021) presented a unified framework DiscProReco to jointly learn dropped pronoun recovery and DDP.Yu et al. (2022) proposed a speaker-context interaction joint encoding model, taking the interactions between different speakers into account.Fan et al. (2022) combined the advantages of both transition-and graph-based paradigms.

Figure 4 :
Figure4: Case studies of on a STAC dialogue.In the lower triangular attention matrix of DialogDP, the shades varying from light blue to dark blue depict the link prediction scores ranging from 0 to 1.

Figure 5 :
Figure 5: The predicted F1 scores of DialogDP on the STAC and Molweni testsets.

Figure 6 :
Figure 6: Comparison of prediction accuracy between typical methods at different dialogue turns.

Figure
Figure 8: The distance distribution of Molweni datasetDependency Relations Comment Q-Elab Conditional Acknowledgement Elaboration Background Contrast Alternation Narration Correction Parallel Continuation QAP Explanation Result Clarification_Q

Table 1 :
Main results on STAC and Molweni datasets."* " denotes the re-run results.L&R refers to LINK&RELATION.The bold text reveals the best performance and the underlined indicates the second best.

Table 4 :
The Predefined Dependency Relations A, text: i have enough to build a settlement too now... ?, turn: 0 speaker: A, text: and it won't let me ,