Hongjin Qian


pdf bib
Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation
Hanxun Zhong | Zhicheng Dou | Yutao Zhu | Hongjin Qian | Ji-Rong Wen
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Personalized dialogue systems explore the problem of generating responses that are consistent with the user’s personality, which has raised much attention in recent years. Existing personalized dialogue systems have tried to extract user profiles from dialogue history to guide personalized response generation. Since the dialogue history is usually long and noisy, most existing methods truncate the dialogue history to model the user’s personality. Such methods can generate some personalized responses, but a large part of dialogue history is wasted, leading to sub-optimal performance of personalized response generation. In this work, we propose to refine the user dialogue history on a large scale, based on which we can handle more dialogue history and obtain more abundant and accurate persona information. Specifically, we design an MSP model which consists of three personal information refiners and a personalized response generator. With these multi-level refiners, we can sparsely extract the most valuable information (tokens) from the dialogue history and leverage other similar users’ data to enhance personalization. Experimental results on two real-world datasets demonstrate the superiority of our model in generating more informative and personalized responses.

pdf bib
ConvTrans: Transforming Web Search Sessions for Conversational Dense Retrieval
Kelong Mao | Zhicheng Dou | Hongjin Qian | Fengran Mo | Xiaohua Cheng | Zhao Cao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Conversational search provides users with a natural and convenient new search experience. Recently, conversational dense retrieval has shown to be a promising technique for realizing conversational search. However, as conversational search systems have not been widely deployed, it is hard to get large-scale real conversational search sessions and relevance labels to support the training of conversational dense retrieval. To tackle this data scarcity problem, previous methods focus on developing better few-shot learning approaches or generating pseudo relevance labels, but the data they use for training still heavily rely on manual generation.In this paper, we present ConvTrans, a data augmentation method that can automatically transform easily-accessible web search sessions into conversational search sessions to fundamentally alleviate the data scarcity problem for conversational dense retrieval. ConvTrans eliminates the gaps between these two types of sessions in terms of session quality and query form to achieve effective session transformation. Extensive evaluations on two widely used conversational search benchmarks, i.e., CAsT-19 and CAsT-20, demonstrate that the same model trained on the data generated by ConvTrans can achieve comparable retrieval performance as it trained on high-quality but expensive artificial conversational search data.

pdf bib
Explicit Query Rewriting for Conversational Dense Retrieval
Hongjin Qian | Zhicheng Dou
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

In a conversational search scenario, a query might be context-dependent because some words are referred to previous expressions or omitted. Previous works tackle the issue by either reformulating the query into a self-contained query (query rewriting) or learning a contextualized query embedding from the query context (context modelling). In this paper, we propose a model CRDR that can perform query rewriting and context modelling in a unified framework in which the query rewriting’s supervision signals further enhance the context modelling. Instead of generating a new query, CRDR only performs necessary modifications on the original query, which improves both accuracy and efficiency of query rewriting. In the meantime, the query rewriting benefits the context modelling by explicitly highlighting relevant terms in the query context, which improves the quality of the learned contextualized query embedding. To verify the effectiveness of CRDR, we perform comprehensive experiments on TREC CAsT-19 and TREC CAsT-20 datasets, and the results show that our method outperforms all baseline models in terms of both quality of query rewriting and quality of context-aware ranking.


pdf bib
Speaker or Listener? The Role of a Dialog Agent
Yafei Liu | Hongjin Qian | Hengpeng Xu | Jinmao Wei
Findings of the Association for Computational Linguistics: EMNLP 2020

For decades, chitchat bots are designed as a listener to passively answer what people ask. This passive and relatively simple dialogue mechanism gains less attention from humans and consumes the interests of human beings rapidly. Therefore some recent researches attempt to endow the bots with proactivity through external knowledge to transform the role from a listener to a speaker with a hypothesis that the speaker expresses more just like a knowledge disseminator. However, along with the proactive manner introduced into a dialogue agent, an issue arises that, with too many knowledge facts to express, the agent starts to talks endlessly, and even completely ignores what the other expresses in dialogue sometimes, which greatly harms the interest of the other chatter to continue the conversation. To the end, we propose a novel model named Initiative-Imitate to interact with adaptive initiative throughout a dialogue. It forces the agent to express in parallel with the appropriate role during the whole conversation. The corresponding experiments show the proposed Initiative-Imitate obtains competitive results both on the automatic and manual metrics. And the fluency and engagement of the chatbot have also been improved significantly. Besides, the case study indicates the Initiative-Imitate can constantly transfer to appropriate role timely and response more properly during the whole continuous conversation.