Emily Dinan


pdf bib
Dialogue in the Wild: Learning from a Deployed Role-Playing Game with Humans and Bots
Kurt Shuster | Jack Urbanek | Emily Dinan | Arthur Szlam | Jason Weston
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Bot-Adversarial Dialogue for Safe Conversational Agents
Jing Xu | Da Ju | Margaret Li | Y-Lan Boureau | Jason Weston | Emily Dinan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Conversational agents trained on large unlabeled corpora of human interactions will learn patterns and mimic behaviors therein, which include offensive or otherwise toxic behavior. We introduce a new human-and-model-in-the-loop framework for evaluating the toxicity of such models, and compare a variety of existing methods in both the cases of non-adversarial and adversarial users that expose their weaknesses. We then go on to propose two novel methods for safe conversational agents, by either training on data from our new human-and-model-in-the-loop framework in a two-stage system, or ”baking-in” safety to the generative model itself. We find our new techniques are (i) safer than existing models; while (ii) maintaining usability metrics such as engagingness relative to state-of-the-art chatbots. In contrast, we expose serious safety issues in existing standard systems like GPT2, DialoGPT, and BlenderBot.

pdf bib
Recipes for Building an Open-Domain Chatbot
Stephen Roller | Emily Dinan | Naman Goyal | Da Ju | Mary Williamson | Yinhan Liu | Jing Xu | Myle Ott | Eric Michael Smith | Y-Lan Boureau | Jason Weston
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we highlight other ingredients. Good conversation requires blended skills: providing engaging talking points, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent persona. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models and code publicly available. Human evaluations show our best models outperform existing approaches in multi-turn dialogue on engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.


pdf bib
Multi-Dimensional Gender Bias Classification
Emily Dinan | Angela Fan | Ledell Wu | Jason Weston | Douwe Kiela | Adina Williams
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Machine learning models are trained to find patterns in data. NLP models can inadvertently learn socially undesirable patterns when training on gender biased text. In this work, we propose a novel, general framework that decomposes gender bias in text along several pragmatic and semantic dimensions: bias from the gender of the person being spoken about, bias from the gender of the person being spoken to, and bias from the gender of the speaker. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information. In addition, we collect a new, crowdsourced evaluation benchmark. Distinguishing between gender bias along multiple dimensions enables us to train better and more fine-grained gender bias classifiers. We show our classifiers are valuable for a variety of applications, like controlling for gender bias in generative models, detecting gender bias in arbitrary text, and classifying text as offensive based on its genderedness.

pdf bib
Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation
Emily Dinan | Angela Fan | Adina Williams | Jack Urbanek | Douwe Kiela | Jason Weston
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Social biases present in data are often directly reflected in the predictions of models trained on that data. We analyze gender bias in dialogue data, and examine how this bias is not only replicated, but is also amplified in subsequent generative chit-chat dialogue models. We measure gender bias in six existing dialogue datasets before selecting the most biased one, the multi-player text-based fantasy adventure dataset LIGHT, as a testbed for bias mitigation techniques. We consider three techniques to mitigate gender bias: counterfactual data augmentation, targeted data collection, and bias controlled training. We show that our proposed techniques mitigate gender bias by balancing the genderedness of generated dialogue utterances, and find that they are particularly effective in combination. We evaluate model performance with a variety of quantitative methods—including the quantity of gendered words, a dialogue safety classifier, and human assessments—all of which show that our models generate less gendered, but equally engaging chit-chat responses.

pdf bib
The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents
Kurt Shuster | Da Ju | Stephen Roller | Emily Dinan | Y-Lan Boureau | Jason Weston
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images. By multi-tasking on such a broad large-scale set of data, we hope to both move towards and measure progress in producing a single unified agent that can perceive, reason and converse with humans in an open-domain setting. We show that such multi-tasking improves over a BERT pre-trained baseline, largely due to multi-tasking with very large dialogue datasets in a similar domain, and that the multi-tasking in general provides gains to both text and image-based tasks using several metrics in both the fine-tune and task transfer settings. We obtain state-of-the-art results on many of the tasks, providing a strong baseline for this challenge.

pdf bib
Adversarial NLI: A New Benchmark for Natural Language Understanding
Yixin Nie | Adina Williams | Emily Dinan | Mohit Bansal | Jason Weston | Douwe Kiela
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this new dataset leads to state-of-the-art performance on a variety of popular NLI benchmarks, while posing a more difficult challenge with its new test set. Our analysis sheds light on the shortcomings of current state-of-the-art models, and shows that non-expert annotators are successful at finding their weaknesses. The data collection method can be applied in a never-ending learning scenario, becoming a moving target for NLU, rather than a static benchmark that will quickly saturate.


pdf bib
Learning to Speak and Act in a Fantasy Text Adventure Game
Jack Urbanek | Angela Fan | Siddharth Karamcheti | Saachi Jain | Samuel Humeau | Emily Dinan | Tim Rocktäschel | Douwe Kiela | Arthur Szlam | Jason Weston
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We introduce a large-scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act whilst conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting. We show that in addition to using past dialogue, these models are able to effectively use the state of the underlying world to condition their predictions. In particular, we show that grounding on the details of the local environment, including location descriptions, and the objects (and their affordances) and characters (and their previous actions) present within it allows better predictions of agent behavior and dialogue. We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully.

pdf bib
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack
Emily Dinan | Samuel Humeau | Bharath Chintagunta | Jason Weston
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The detection of offensive language in the context of a dialogue has become an increasingly important application of natural language processing. The detection of trolls in public forums (Galán-García et al., 2016), and the deployment of chatbots in the public domain (Wolf et al., 2017) are two examples that show the necessity of guarding against adversarially offensive behavior on the part of humans. In this work, we develop a training scheme for a model to become robust to such human attacks by an iterative build it, break it, fix it scheme with humans and models in the loop. In detailed experiments we show this approach is considerably more robust than previous systems. Further, we show that offensive language used within a conversation critically depends on the dialogue context, and cannot be viewed as a single sentence offensive detection task as in most previous work. Our newly collected tasks and methods are all made open source and publicly available.


pdf bib
Personalizing Dialogue Agents: I have a dog, do you have pets too?
Saizheng Zhang | Emily Dinan | Jack Urbanek | Arthur Szlam | Douwe Kiela | Jason Weston
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i)condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.

pdf bib
Retrieve and Refine: Improved Sequence Generation Models For Dialogue
Jason Weston | Emily Dinan | Alexander Miller
Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI

Sequence generation models for dialogue are known to have several problems: they tend to produce short, generic sentences that are uninformative and unengaging. Retrieval models on the other hand can surface interesting responses, but are restricted to the given retrieval set leading to erroneous replies that cannot be tuned to the specific context. In this work we develop a model that combines the two approaches to avoid both their deficiencies: first retrieve a response and then refine it – the final sequence generator treating the retrieval as additional context. We show on the recent ConvAI2 challenge task our approach produces responses superior to both standard retrieval and generation models in human evaluations.