Multi-Granularity Information Interaction Framework for Incomplete Utterance Rewriting

Recent approaches in Incomplete Utterance Rewriting (IUR) fail to capture the source of important words, which is crucial to edit the incomplete utterance, and introduce words from irrelevant utterances. We propose a novel and effective multi-task information interaction framework including context selection, edit matrix construction, and relevance merging to capture the multi-granularity of semantic information. Benefiting from fetching the relevant utterance and figuring out the important words, our approach outperforms existing state-of-the-art models on two benchmark datasets Restoration-200K and CANAND in this field. Code will be provided on \url{https://github.com/yanmenxue/QR}.


Introduction
Recently increasing attention has been paid to multi-turn dialogue modeling (Choi et al., 2018;Reddy et al., 2019;Sun et al., 2019) and the major challenge in this field is that speakers tend to use incomplete utterances for brevity, such as referring back to (i.e., co-reference) or omitting (i.e., ellipsis) entities or concepts that appear in dialogue history.Su et al. (2019) shows that ellipsis and co-reference can exist in more than 70% of dialogue utterances.To handle this, the task of Incomplete Utterance Rewriting (IUR) (Pan et al., 2019;Elgohary et al., 2019) is proposed to rewrite an incomplete utterance into an utterance which is semantically equivalent but self-contained to be understood without context.
To maintain the similarity of semantic structure between the incomplete utterance and rewritten utterance, recent approaches formulate it as a word edit task (Liu et al., 2020;Zhang et al., 2022) and predict the edit types by capturing the semantic relations between words.However, the sentence-level semantic relations between contextual utterances and incomplete utterance are ne-

Turns
Utterance Parsons studied biology at Amherst college.
u 2 Who is one of his professors at Amherst?
Parsons ' biology professors at Amherst were Glaser and Henry.u 4 What are his interest?
Parsons showed from early on, a great interest in the topic of philosophy, u 6 Anything else he was interested to?
Other than philosophy, is there anything else parsons was interested in?

RUN
Besides biology Glaser and Henry, anything else was he interested to?Table 1: One example from CANARD dataset.u 1u 5 are 5 turns of contextual utterances, u 6 denotes the incomplete utterance, u * 6 denote the golden rewriten utterance, and "RUN" denotes the incorrect result of baseline (Liu et al., 2020).glected.Unaware of which sentences contain the words needed to rewrite the incomplete utterance (important words) (Inoue et al., 2022), these models introduce incorrect words from irrelevant sentences into the rewritten utterance.
We take an example in table 1.The incomplete utterance has the phenomenon of co-reference ("he") and ellipsis ("anything else").Because the baseline model RUN (Liu et al., 2020) does not fetch the correct source sentence (u 5 ) and the important words ("philosophy"), it introduces irrelevant words ("Glaser and Henry") into rewriting the utterance.
To identify the source sentences of important words and incorporate the sentence-level relations among contextual utterances and incomplete utterance, we propose our multi-granularity information capturing framework for IUR.Firstly, we classify the sentences in contexts into relevant or irrelevant utterances and match the rewritten utterance with the relevant contexts.Then we capture the tokenlevel semantic relations to construct the token edit matrix.The predicted relevances among sentences in contexts and incomplete utterance, which encodes the sentence-level relations, are utilized to mask the edit matrix.The rewritten utterance is derived by referring to the token matrix.We conduct experiments on two benchmark datasets in IUR and outperform the prior state-of-the-art by about 1.0 score in Restoration-200K dataset and derive competitive performance in CANARD dataset across different metrics including BLEU, ROUGE and F-score.
Our contributions can be summarized as: 1.We are the first to incorporate sentence-level semantic relations between the utterances in contexts and the incomplete utterance to enhance the ability to seize the source sentence and figure out the important words.2. We propose the multi-task information interaction framework to capture the multi-granularity of semantic information.Our approach outperforms existing methods on the benchmark dataset of this field, becoming the new state-of-the-art.

Related Work
There are two main streams of approaches to tackle the task of IUR: generation-based (Huang et al., 2021;Inoue et al., 2022) and edit-based (Liu et al., 2020;Si et al., 2022).Generation-based models solve this task as a seq2seq problem.Su et al. (2019) utilize pointer network to respectively predict the prob of tokens in rewritten utterance from contexts and incomplete utterance.Hao et al. (2021) formulate the task as sequence tagging to reduce the search space.Huang et al. (2021) combine a source sequence tagger with an LSTM-based decoder to maintain the grammatical correctness.Generation models lack to capture the trait of IUR, where the main semantic structure of a rewritten utterance is usually similar to the original incomplete utterance.
Edit-based models focus on predicting wordlevel or span-level edit type between contextual utterances and the incomplete utterance.Liu et al. (2020) formulate this task as semantic segmentation and propose U-Net (Ronneberger et al., 2015;Oktay et al., 2018) to encode the word-level semantic relations.To incorporate the rich information in self-attention weights of pretrained language model (PTM) (Devlin et al., 2018), Zhang et al. (2022) directly take the self-attention weight matrix as the word2word edit matrix.Though these models produce competitive performance, the sentence-level semantic relations are neglected and the models tend to introduce incorrect words from irrelevant contextual utterances.

Overview
By figure 1, our approach contains four components: context selection, edit matrix construction, relevance merging and intention check.

Context Selection
In this part, we capture the semantic relations between contextual utterances and incomplete utterance.Following RUN, We utilize BERT to encode the contextual representation of contexts and incomplete utterance.The input to PTM is the sequence of contexts concatenated by incomplete utterance, and the "[SEP]" token is applied to separate different sentences.The utterance representation is derived by pooling the hidden states of words it contains: , where c i denotes the representation of i-th utterance in contexts, u denotes the representation of incomplete utterance and w i denotes the i-th word token in the input sequence.We apply a MLP classifier to predict the relevance between each utterance in contexts and incomplete utterance: (1) where r i and R i denote the predicted relevance and label of i-th utterance in contexts respectively.Considering the golden label is not provided, we retrieve the utterance that contains the words used to rewrite the utterance as the label of relevant utterance.To enhance the compatibility of selected contextual utterance with the rewritten utterance, we sample negative utterances from contexts of other cases in the batch and predict the match score:

Model
where m i and M i denote the predicted and labeled matching score of i-th contextual utterance, C P and C N denote the golden contextual utterance that includes important words and the sampled negative utterance.

Edit Matrix Construction
Following (Liu et al., 2020), we predict to-ken2token edit type (Insert, Replace, None) based on word representations from PTM with U-Net and build the word edit matrix.The entry at row i and column j in the matrix denotes the edit type between i-th token in contexts and j-th token in incomplete utterance.It can be formulated as: where e ij ∈ R 3 denotes the predicted probability of 3 edit types between i-th token in contexts and jth token in incomplete utterance, and N U denotes the context length and incomplete utterance length.

Relevance Merging
If some utterance in contexts is classified as relevant by our Context Selection module, it is more possible for the words in this utterance to be adopted into editing the incomplete utterance.In this part, the relevant confidence of the utterance is equally added to the predicted prob of edit type Insert and Replace for all its constitutive words with the words in incomplete utterance: where E ij denotes the golden edit type between i-th token in contexts and j-th token in incomplete utterance and α denotes parameters to tune.The relevance merging process can be seen as utilizing the relevance predicted to "softly mask" the edit matrix.Even if one sentence is not selected as relevant contexts, the probabilities of edit types Insert and Replace are not strictly set to zero.
where C denotes the pooling representations of utterances in contexts and KL denotes the KLdivergence function.

Training and Rewriting
The final training loss function is computed by the weighted sum of edit loss, selection loss, matching loss and intention loss: where α i , i = 1, 2, 3 denote parameters to tune.The rewritten utterance is manipulated based on the predicted edit matrix [ê ij ] 1≤j≤N U 1≤i≤N C .That is, if êij = Insert, we insert the i-th token in contexts before the j-th token in incomplete utterance, which is similar for Replace type.

Experimental Setup
Datasets: We do experiments on two benchmark IUR datasets from different languages: Restoration-200K (Chinese, (Pan et al., 2019)) and CANARD (English, (Elgohary et al., 2019)).Metrics: Following Liu et al. (2020), we utilize BLEU (Papineni et al., 2002), ROUGE (Lin, 2004) and F-score (Pan et al., 2019) as evaluation metrics.Baselines: we compare our approach with recent competitive approaches as follows: RUN (Liu et al., 2020) formulates IUR as a semantic segmentation task and obtains a much faster inference speed than generating from scratch, which can be seen as our backbone; SARG (Huang et al., 2021) proposes a semi auto-regressive generator with the high efficiency and flexibility; RAU (Zhang et al., 2022) directly extracts the co-reference and omission relationship from the self-attention weight matrix of the transformer; RAST (Hao et al., 2021) proposes a novel sequence-tagging based model to reduce the search space; QUEEN (Si et al., 2022) designs an explicit query template to bring guided semantic structural knowledge; HCT (Jin et al., 2022) constructs a hierarchical context tagger that mitigates the multiple spans issue by predicting slotted rules.

Results
By table 2 and 3, compared with our backbone RUN, our model improves about 1.0 ROUGE and BLEU score, and 2.5 F-score in Restoration-200K dataset, as well as 2.0 ROUGE and BLEU score in CANARD dataset.It demonstrates the efficiency of our multi-task framework to fetch the source of important words and avoid irrelevant words.Specially, we outperform QUEEN by 0.7 BLEU1, 0.6 BLEU2, 0.7 ROUGE1 and 0.3 ROUGE2, and beat RAU by about 1.0 F-score on Restoration-200K dataset, becoming the new state-of-the-art.Moreover, our model shows competitive performance on CANARD dataset, which beats QUEEN by 0.5 BLEU1 score.

Ablation Study
To explore different modules of our approach, we conduct the ablation study.Compared with the "soft mask" of relevance merging in section 3.4, we design an ablation with "hard mask" merging: if a sentence in contexts is classified as irrelevant in section 3.2, the probabilities of Insert and Rewrite between its words and the words in the incomplete utterance are set to 0. In table 4, context selection, relevance merging and intention checking show progressive improvement across different metrics.Compared with hard mask of relevance merging, the soft mask method is overall better.The sentence classified as irrelevant may still contain im-portant words, so merging its relevance in a soft way is necessary.Context marching module helps the approach to capture the important words from contexts.

Conclusion
In this paper, we argue that capturing the source of important words can diminish to introduce irrelevant words for IUR.We propose a novel and effective multi-task framework including context selection, edit matrix construction, and relevance merging to capture the multi-granularity of semantic information and fetch the relevant utterance.we do experiments on two benchmark datasets in IUR and show competitive performance.

Limitations
We propose a novel and effective multi-task framework to capture the multi-granularity of semantic information and fetch the relevant utterance.With the population of large language model (LLM), the multi-task finetuning framework may bring more computation cost.We will explore the combination of our approach with LLM in the future.
Figure 1: Model pipeline.Our model contains 4 parts: context selection, edit matrix construction, relevance merging and intention check. Table