Hainan Zhang


2022

pdf bib
From spoken dialogue to formal summary: An utterance rewriting for dialogue summarization
Yue Fang | Hainan Zhang | Hongshen Chen | Zhuoye Ding | Bo Long | Yanyan Lan | Yanquan Zhou
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Due to the dialogue characteristics of unstructured contexts and multi-parties with first-person perspective, many successful text summarization works have failed when dealing with dialogue summarization. In dialogue summarization task, the input dialogue is usually spoken style with ellipsis and co-references but the output summaries are more formal and complete. Therefore, the dialogue summarization model should be able to complete the ellipsis content and co-reference information and then produce a suitable summary accordingly. However, the current state-of-the-art models pay more attention on the topic or structure of summary, rather than the consistency of dialogue summary with its input dialogue context, which may suffer from the personal and logical inconsistency problem. In this paper, we propose a new model, named ReWriteSum, to tackle this problem. Firstly, an utterance rewriter is conducted to complete the ellipsis content of dialogue content and then obtain the rewriting utterances. Then, the co-reference data augmentation mechanism is utilized to replace the referential person name with its specific name to enhance the personal information. Finally, the rewriting utterances and the co-reference replacement data are used in the standard BART model. Experimental results on both SAMSum and DialSum datasets show that our ReWriteSum significantly outperforms baseline models, in terms of both metric-based and human evaluations. Further analysis on multi-speakers also shows that ReWriteSum can obtain relatively higher improvement with more speakers, validating the correctness and property of ReWriteSum.

2021

pdf bib
Augmenting Knowledge-grounded Conversations with Sequential Knowledge Transition
Haolan Zhan | Hainan Zhang | Hongshen Chen | Zhuoye Ding | Yongjun Bao | Yanyan Lan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Knowledge data are massive and widespread in the real-world, which can serve as good external sources to enrich conversations. However, in knowledge-grounded conversations, current models still lack the fine-grained control over knowledge selection and integration with dialogues, which finally leads to the knowledge-irrelevant response generation problems: 1) knowledge selection merely relies on the dialogue context, ignoring the inherent knowledge transitions along with conversation flows; 2) the models often over-fit during training, resulting with incoherent response by referring to unrelated tokens from specific knowledge content in the testing phase; 3) although response is generated upon the dialogue history and knowledge, the models often tend to overlook the selected knowledge, and hence generates knowledge-irrelevant response. To address these problems, we proposed to explicitly model the knowledge transition in sequential multi-turn conversations by abstracting knowledge into topic tags. Besides, to fully utilizing the selected knowledge in generative process, we propose pre-training a knowledge-aware response generator to pay more attention on the selected knowledge. In particular, a sequential knowledge transition model equipped with a pre-trained knowledge-aware response generator (SKT-KG) formulates the high-level knowledge transition and fully utilizes the limited knowledge data. Experimental results on both structured and unstructured knowledge-grounded dialogue benchmarks indicate that our model achieves better performance over baseline models.

pdf bib
CoLV: A Collaborative Latent Variable Model for Knowledge-Grounded Dialogue Generation
Haolan Zhan | Lei Shen | Hongshen Chen | Hainan Zhang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Knowledge-grounded dialogue generation has achieved promising performance with the engagement of external knowledge sources. Typical approaches towards this task usually perform relatively independent two sub-tasks, i.e., knowledge selection and knowledge-aware response generation. In this paper, in order to improve the diversity of both knowledge selection and knowledge-aware response generation, we propose a collaborative latent variable (CoLV) model to integrate these two aspects simultaneously in separate yet collaborative latent spaces, so as to capture the inherent correlation between knowledge selection and response generation. During generation, our proposed model firstly draws knowledge candidate from the latent space conditioned on the dialogue context, and then samples a response from another collaborative latent space conditioned on both the context and the selected knowledge. Experimental results on two widely-used knowledge-grounded dialogue datasets show that our model outperforms previous methods on both knowledge selection and response generation.

pdf bib
Adaptive Bridge between Training and Inference for Dialogue Generation
Haoran Xu | Hainan Zhang | Yanyan Zou | Hongshen Chen | Zhuoye Ding | Yanyan Lan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Although exposure bias has been widely studied in some NLP tasks, it faces its unique challenges in dialogue response generation, the representative one-to-various generation scenario.In real human dialogue, there are many appropriate responses for the same context, not only with different expressions, but also with different topics. Therefore, due to the much bigger gap between various ground-truth responses and the generated synthetic response, exposure bias is more challenging in dialogue generation task.What’s more, as MLE encourages the model to only learn the common words among different ground-truth responses, but ignores the interesting and specific parts, exposure bias may further lead to the common response generation problem, such as “I don’t know” and “HaHa?” In this paper, we propose a novel adaptive switching mechanism, which learns to automatically transit between ground-truth learning and generated learning regarding the word-level matching score, such as the cosine similarity. Experimental results on both Chinese STC dataset and English Reddit dataset, show that our adaptive method achieves a significant improvement in terms of metric-based evaluation and human evaluation, as compared with the state-of-the-art exposure bias approaches. Further analysis on NMT task also shows that our model can achieve a significant improvement.

pdf bib
Topic-Aware Contrastive Learning for Abstractive Dialogue Summarization
Junpeng Liu | Yanyan Zou | Hainan Zhang | Hongshen Chen | Zhuoye Ding | Caixia Yuan | Xiaojie Wang
Findings of the Association for Computational Linguistics: EMNLP 2021

Unlike well-structured text, such as news reports and encyclopedia articles, dialogue content often comes from two or more interlocutors, exchanging information with each other. In such a scenario, the topic of a conversation can vary upon progression and the key information for a certain topic is often scattered across multiple utterances of different speakers, which poses challenges to abstractly summarize dialogues. To capture the various topic information of a conversation and outline salient facts for the captured topics, this work proposes two topic-aware contrastive learning objectives, namely coherence detection and sub-summary generation objectives, which are expected to implicitly model the topic change and handle information scattering challenges for the dialogue summarization task. The proposed contrastive objectives are framed as auxiliary tasks for the primary dialogue summarization task, united via an alternative parameter updating strategy. Extensive experiments on benchmark datasets demonstrate that the proposed simple method significantly outperforms strong baselines and achieves new state-of-the-art performance. The code and trained models are publicly available via .

pdf bib
FCM: A Fine-grained Comparison Model for Multi-turn Dialogue Reasoning
Xu Wang | Hainan Zhang | Shuai Zhao | Yanyan Zou | Hongshen Chen | Zhuoye Ding | Bo Cheng | Yanyan Lan
Findings of the Association for Computational Linguistics: EMNLP 2021

Despite the success of neural dialogue systems in achieving high performance on the leader-board, they cannot meet users’ requirements in practice, due to their poor reasoning skills. The underlying reason is that most neural dialogue models only capture the syntactic and semantic information, but fail to model the logical consistency between the dialogue history and the generated response. Recently, a new multi-turn dialogue reasoning task has been proposed, to facilitate dialogue reasoning research. However, this task is challenging, because there are only slight differences between the illogical response and the dialogue history. How to effectively solve this challenge is still worth exploring. This paper proposes a Fine-grained Comparison Model (FCM) to tackle this problem. Inspired by human’s behavior in reading comprehension, a comparison mechanism is proposed to focus on the fine-grained differences in the representation of each response candidate. Specifically, each candidate representation is compared with the whole history to obtain a history consistency representation. Furthermore, the consistency signals between each candidate and the speaker’s own history are considered to drive a model prefer a candidate that is logically consistent with the speaker’s history logic. Finally, the above consistency representations are employed to output a ranking list of the candidate responses for multi-turn dialogue reasoning. Experimental results on two public dialogue datasets show that our method obtains higher ranking scores than the baseline models.

2019

pdf bib
ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation
Hainan Zhang | Yanyan Lan | Liang Pang | Jiafeng Guo | Xueqi Cheng
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In multi-turn dialogue generation, response is usually related with only a few contexts. Therefore, an ideal model should be able to detect these relevant contexts and produce a suitable response accordingly. However, the widely used hierarchical recurrent encoder-decoder models just treat all the contexts indiscriminately, which may hurt the following response generation process. Some researchers try to use the cosine similarity or the traditional attention mechanism to find the relevant contexts, but they suffer from either insufficient relevance assumption or position bias problem. In this paper, we propose a new model, named ReCoSa, to tackle this problem. Firstly, a word level LSTM encoder is conducted to obtain the initial representation of each context. Then, the self-attention mechanism is utilized to update both the context and masked response representation. Finally, the attention weights between each context and response representations are computed and used in the further decoding process. Experimental results on both Chinese customer services dataset and English Ubuntu dialogue dataset show that ReCoSa significantly outperforms baseline models, in terms of both metric-based and human evaluations. Further analysis on attention shows that the detected relevant contexts by ReCoSa are highly coherent with human’s understanding, validating the correctness and interpretability of ReCoSa.

2018

pdf bib
Tailored Sequence to Sequence Models to Different Conversation Scenarios
Hainan Zhang | Yanyan Lan | Jiafeng Guo | Jun Xu | Xueqi Cheng
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sequence to sequence (Seq2Seq) models have been widely used for response generation in the area of conversation. However, the requirements for different conversation scenarios are distinct. For example, customer service requires the generated responses to be specific and accurate, while chatbot prefers diverse responses so as to attract different users. The current Seq2Seq model fails to meet these diverse requirements, by using a general average likelihood as the optimization criteria. As a result, it usually generates safe and commonplace responses, such as ‘I don’t know’. In this paper, we propose two tailored optimization criteria for Seq2Seq to different conversation scenarios, i.e., the maximum generated likelihood for specific-requirement scenario, and the conditional value-at-risk for diverse-requirement scenario. Experimental results on the Ubuntu dialogue corpus (Ubuntu service scenario) and Chinese Weibo dataset (social chatbot scenario) show that our proposed models not only satisfies diverse requirements for different scenarios, but also yields better performances against traditional Seq2Seq models in terms of both metric-based and human evaluations.