Behnam Hedayatnia


2022

pdf bib
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation
Sarik Ghazarian | Behnam Hedayatnia | Alexandros Papangelis | Yang Liu | Dilek Hakkani-Tur
Findings of the Association for Computational Linguistics: ACL 2022

Accurate automatic evaluation metrics for open-domain dialogs are in high demand. Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect. In this work, we propose to use information that can be automatically extracted from the next user utterance, such as its sentiment or whether the user explicitly ends the conversation, as a proxy to measure the quality of the previous system response. This allows us to train on a massive set of dialogs with weak supervision, without requiring manual system turn quality annotations. Experiments show that our model is comparable to models trained on human annotated data. Furthermore, our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.

pdf bib
A Systematic Evaluation of Response Selection for Open Domain Dialogue
Behnam Hedayatnia | Di Jin | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Recent progress on neural approaches for language processing has triggered a resurgence of interest on building intelligent open-domain chatbots. However, even the state-of-the-art neural chatbots cannot produce satisfying responses for every turn in a dialog. A practical solution is to generate multiple response candidates for the same context, and then perform response ranking/selection to determine which candidate is the best. Previous work in response selection typically trains response rankers using synthetic data that is formed from existing dialogs by using a ground truth response as the single appropriate response and constructing inappropriate responses via random selection or using adversarial methods. In this work, we curated a dataset where responses from multiple response generators produced for the same dialog context are manually annotated as appropriate (positive) and inappropriate (negative). We argue that such training data better matches the actual use case examples, enabling the models to learn to rank responses effectively. With this new dataset, we conduct a systematic evaluation of state-of-the-art methods for response selection, and demonstrate that both strategies of using multiple positive candidates and using manually verified hard negative candidates can bring in significant performance improvement in comparison to using the adversarial training data, e.g., increase of 3% and 13% in Recall@1 score, respectively.

pdf bib
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation
Pei Zhou | Karthik Gopalakrishnan | Behnam Hedayatnia | Seokhwan Kim | Jay Pujara | Xiang Ren | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Implicit knowledge, such as common sense, is key to fluid human conversations. Current neural response generation (RG) models are trained to generate responses directly, omitting unstated implicit knowledge. In this paper, we present Think-Before-Speaking (TBS), a generative approach to first externalize implicit commonsense knowledge (think) and use this knowledge to generate responses (speak). We argue that externalizing implicit knowledge allows more efficient learning, produces more informative responses, and enables more explainable models. We analyze different choices to collect knowledge-aligned dialogues, represent implicit knowledge, and transition between knowledge and dialogues. Empirical results show TBS models outperform end-to-end and knowledge-augmented RG baselines on most automatic metrics and generate more informative, specific, and commonsense-following responses, as evaluated by human annotators. TBS also generates knowledge that makes sense and is relevant to the dialogue around 85% of the time

2021

pdf bib
Think Before You Speak: Learning to Generate Implicit Knowledge for Response Generation by Self-Talk
Pei Zhou | Behnam Hedayatnia | Karthik Gopalakrishnan | Seokhwan Kim | Jay Pujara | Xiang Ren | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Humans make appropriate responses not only based on previous dialogue utterances but also on implicit background knowledge such as common sense. Although neural response generation models seem to produce human-like responses, they are mostly end-to-end and not generating intermediate grounds between a dialogue history and responses. This work aims to study if and how we can train an RG model that talks with itself to generate implicit knowledge before making responses. We further investigate can such models identify when to generate implicit background knowledge and when it is not necessary. Experimental results show that compared with models that directly generate responses given a dialogue history, self-talk models produce better-quality responses according to human evaluation on grammaticality, coherence, and engagingness. And models that are trained to identify when to self-talk further improves the response quality. Analysis on generated implicit knowledge shows that models mostly use the knowledge appropriately in the responses.

pdf bib
Multi-Sentence Knowledge Selection in Open-Domain Dialogue
Mihail Eric | Nicole Chartier | Behnam Hedayatnia | Karthik Gopalakrishnan | Pankaj Rajan | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 14th International Conference on Natural Language Generation

Incorporating external knowledge sources effectively in conversations is a longstanding problem in open-domain dialogue research. The existing literature on open-domain knowledge selection is limited and makes certain brittle assumptions on knowledge sources to simplify the overall task, such as the existence of a single relevant knowledge sentence per context. In this work, we evaluate the existing state of open-domain conversation knowledge selection, showing where the existing methodologies regarding data and evaluation are flawed. We then improve on them by proposing a new framework for collecting relevant knowledge, and create an augmented dataset based on the Wizard of Wikipedia (WOW) corpus, which we call WOW++. WOW++ averages 8 relevant knowledge sentences per dialogue context, embracing the inherent ambiguity of open-domain dialogue knowledge selection. We then benchmark various knowledge ranking algorithms on this augmented dataset with both intrinsic evaluation and extrinsic measures of response quality, showing that neural rerankers that use WOW++ can outperform rankers trained on standard datasets.

pdf bib
Commonsense-Focused Dialogues for Response Generation: An Empirical Study
Pei Zhou | Karthik Gopalakrishnan | Behnam Hedayatnia | Seokhwan Kim | Jay Pujara | Xiang Ren | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Smooth and effective communication requires the ability to perform latent or explicit commonsense inference. Prior commonsense reasoning benchmarks (such as SocialIQA and CommonsenseQA) mainly focus on the discriminative task of choosing the right answer from a set of candidates, and do not involve interactive language generation as in dialogue. Moreover, existing dialogue datasets do not explicitly focus on exhibiting commonsense as a facet. In this paper, we present an empirical study of commonsense in dialogue response generation. We first auto-extract commonsensical dialogues from existing dialogue datasets by leveraging ConceptNet, a commonsense knowledge graph. Furthermore, building on social contexts/situations in SocialIQA, we collect a new dialogue dataset with 25K dialogues aimed at exhibiting social commonsense in an interactive setting. We evaluate response generation models trained using these datasets and find that models trained on both extracted and our collected data produce responses that consistently exhibit more commonsense than baselines. Finally we propose an approach for automatic evaluation of commonsense that relies on features derived from ConceptNet and pre-trained language and dialog models, and show reasonable correlation with human evaluation of responses’ commonsense quality.

2020

pdf bib
Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access
Seokhwan Kim | Mihail Eric | Karthik Gopalakrishnan | Behnam Hedayatnia | Yang Liu | Dilek Hakkani-Tur
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Most prior work on task-oriented dialogue systems are restricted to a limited coverage of domain APIs, while users oftentimes have domain related requests that are not covered by the APIs. In this paper, we propose to expand coverage of task-oriented dialogue systems by incorporating external unstructured knowledge sources. We define three sub-tasks: knowledge-seeking turn detection, knowledge selection, and knowledge-grounded response generation, which can be modeled individually or jointly. We introduce an augmented version of MultiWOZ 2.1, which includes new out-of-API-coverage turns and responses grounded on external knowledge sources. We present baselines for each sub-task using both conventional and neural approaches. Our experimental results demonstrate the need for further research in this direction to enable more informative conversational systems.

pdf bib
Incorporating Commonsense Knowledge Graph in Pretrained Models for Social Commonsense Tasks
Ting-Yun Chang | Yang Liu | Karthik Gopalakrishnan | Behnam Hedayatnia | Pei Zhou | Dilek Hakkani-Tur
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

Pretrained language models have excelled at many NLP tasks recently; however, their social intelligence is still unsatisfactory. To enable this, machines need to have a more general understanding of our complicated world and develop the ability to perform commonsense reasoning besides fitting the specific downstream tasks. External commonsense knowledge graphs (KGs), such as ConceptNet, provide rich information about words and their relationships. Thus, towards general commonsense learning, we propose two approaches to implicitly and explicitly infuse such KGs into pretrained language models. We demonstrate our proposed methods perform well on SocialIQA, a social commonsense reasoning task, in both limited and full training data regimes.

pdf bib
Policy-Driven Neural Response Generation for Knowledge-Grounded Dialog Systems
Behnam Hedayatnia | Karthik Gopalakrishnan | Seokhwan Kim | Yang Liu | Mihail Eric | Dilek Hakkani-Tur
Proceedings of the 13th International Conference on Natural Language Generation

Open-domain dialog systems aim to generate relevant, informative and engaging responses. In this paper, we propose using a dialog policy to plan the content and style of target, open domain responses in the form of an action plan, which includes knowledge sentences related to the dialog context, targeted dialog acts, topic information, etc. For training, the attributes within the action plan are obtained by automatically annotating the publicly released Topical-Chat dataset. We condition neural response generators on the action plan which is then realized as target utterances at the turn and sentence levels. We also investigate different dialog policy models to predict an action plan given the dialog context. Through automated and human evaluation, we measure the appropriateness of the generated responses and check if the generation models indeed learn to realize the given action plans. We demonstrate that a basic dialog policy that operates at the sentence level generates better responses in comparison to turn level generation as well as baseline models with no action plan. Additionally the basic dialog policy has the added benefit of controllability.

2019

pdf bib
Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators
Sanghyun Yi | Rahul Goel | Chandra Khatri | Alessandra Cervone | Tagyoung Chung | Behnam Hedayatnia | Anu Venkatesh | Raefer Gabriel | Dilek Hakkani-Tur
Proceedings of the 12th International Conference on Natural Language Generation

Encoder-decoder based neural architectures serve as the basis of state-of-the-art approaches in end-to-end open domain dialog systems. Since most of such systems are trained with a maximum likelihood (MLE) objective they suffer from issues such as lack of generalizability and the generic response problem, i.e., a system response that can be an answer to a large number of user utterances, e.g., “Maybe, I don’t know.” Having explicit feedback on the relevance and interestingness of a system response at each turn can be a useful signal for mitigating such issues and improving system quality by selecting responses from different approaches. Towards this goal, we present a system that evaluates chatbot responses at each dialog turn for coherence and engagement. Our system provides explicit turn-level dialog quality feedback, which we show to be highly correlated with human evaluation. To show that incorporating this feedback in the neural response generation models improves dialog quality, we present two different and complementary mechanisms to incorporate explicit feedback into a neural response generation model: reranking and direct modification of the loss function during training. Our studies show that a response generation model that incorporates these combined feedback mechanisms produce more engaging and coherent responses in an open-domain spoken dialog setting, significantly improving the response quality using both automatic and human evaluation.

pdf bib
Natural Language Generation at Scale: A Case Study for Open Domain Question Answering
Alessandra Cervone | Chandra Khatri | Rahul Goel | Behnam Hedayatnia | Anu Venkatesh | Dilek Hakkani-Tur | Raefer Gabriel
Proceedings of the 12th International Conference on Natural Language Generation

Current approaches to Natural Language Generation (NLG) for dialog mainly focus on domain-specific, task-oriented applications (e.g. restaurant booking) using limited ontologies (up to 20 slot types), usually without considering the previous conversation context. Furthermore, these approaches require large amounts of data for each domain, and do not benefit from examples that may be available for other domains. This work explores the feasibility of applying statistical NLG to scenarios requiring larger ontologies, such as multi-domain dialog applications or open-domain question answering (QA) based on knowledge graphs. We model NLG through an Encoder-Decoder framework using a large dataset of interactions between real-world users and a conversational agent for open-domain QA. First, we investigate the impact of increasing the number of slot types on the generation quality and experiment with different partitions of the QA data with progressively larger ontologies (up to 369 slot types). Second, we perform multi-task learning experiments between open-domain QA and task-oriented dialog, and benchmark our model on a popular NLG dataset. Moreover, we experiment with using the conversational context as an additional input to improve response generation quality. Our experiments show the feasibility of learning statistical NLG models for open-domain QA with larger ontologies.