William B. Dolan

Also published as: Bill Dolan, William Dolan


2024

pdf bib
Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs
Claire Jin | Sudha Rao | Xiangyu Peng | Portia Botchway | Jessica Quaye | Chris Brockett | Bill Dolan
Findings of the Association for Computational Linguistics: ACL 2024

Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detecting such game bugs are still lacking. To address this, we propose a systematic LLM-based method for automatically identifying such bugs from player game logs, eliminating the need for collecting additional data such as post-play surveys. Applied to a text-based game DejaBoom!, our approach effectively identifies bugs inherent in LLM-powered interactive games, surpassing unstructured LLM-powered bug-catching methods and filling the gap in automated detection of logical and design flaws.

pdf bib
Investigating Agency of LLMs in Human-AI Collaboration Tasks
Ashish Sharma | Sudha Rao | Chris Brockett | Akanksha Malhotra | Nebojsa Jojic | Bill Dolan
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Agency, the capacity to proactively shape events, is central to how humans interact and collaborate. While LLMs are being developed to simulate human behavior and serve as human-like agents, little attention has been given to the Agency that these models should possess in order to proactively manage the direction of interaction and collaboration. In this paper, we investigate Agency as a desirable function of LLMs, and how it can be measured and managed. We build on social-cognitive theory to develop a framework of features through which Agency is expressed in dialogue – indicating what you intend to do (Intentionality), motivating your intentions (Motivation), having self-belief in intentions (Self-Efficacy), and being able to self-adjust (Self-Regulation). We collect a new dataset of 83 human-human collaborative interior design conversations containing 908 conversational snippets annotated for Agency features. Using this dataset, we develop methods for measuring Agency of LLMs. Automatic and human evaluations show that models that manifest features associated with high Intentionality, Motivation, Self-Efficacy, and Self-Regulation are more likely to be perceived as strongly agentive.

2023

pdf bib
Interactive Text Generation
Felix Faltings | Michel Galley | Kianté Brantley | Baolin Peng | Weixin Cai | Yizhe Zhang | Jianfeng Gao | Bill Dolan
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Users interact with text, image, code, or other editors on a daily basis. However, machine learning models are rarely trained in the settings that reflect the interactivity between users and their editor. This is understandable as training AI models with real users is not only slow and costly, but what these models learn may be specific to user interface design choices. Unfortunately, this means most of the research on text, code, and image generation has focused on non-interactive settings, whereby the model is expected to get everything right without accounting for any input from a user who may be willing to help. We introduce a new Interactive Text Generation task that allows training generation models interactively without the costs of involving real users, by using user simulators that provide edits that guide the model towards a given target text. We train our interactive models using Imitation Learning, and our experiments against competitive non-interactive generation models show that models trained interactively are superior to their non-interactive counterparts, even when all models are given the same budget of user inputs or edits.

pdf bib
Towards More Efficient Insertion Transformer with Fractional Positional Encoding
Zhisong Zhang | Yizhe Zhang | Bill Dolan
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Auto-regressive neural sequence models have been shown to be effective across text generation tasks. However, their left-to-right decoding order prevents generation from being parallelized. Insertion Transformer (Stern et al., 2019) is an attractive alternative that allows outputting multiple tokens in a single generation step. Nevertheless, due to the incompatibility between absolute positional encoding and insertion-based generation schemes, it needs to refresh the encoding of every token in the generated partial hypothesis at each step, which could be costly. We design a novel reusable positional encoding scheme for Insertion Transformers called Fractional Positional Encoding (FPE), which allows reusing representations calculated in previous steps. Empirical studies on various text generation tasks demonstrate the effectiveness of FPE, which leads to floating-point operation reduction and latency improvements on batched decoding.

2022

pdf bib
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu | Yizhe Zhang | Chris Brockett | Yi Mao | Zhifang Sui | Weizhu Chen | Bill Dolan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large pretrained generative models like GPT-3 often suffer from hallucinating non-existent or incorrect content, which undermines their potential merits in real applications. Existing work usually attempts to detect these hallucinations based on a corresponding oracle reference at a sentence or document level. However ground-truth references may not be readily available for many free-form text generation applications, and sentence- or document-level detection may fail to provide the fine-grained signals that would prevent fallacious content in real time. As a first step to addressing these issues, we propose a novel token-level, reference-free hallucination detection task and an associated annotated dataset named HaDeS (HAllucination DEtection dataSet). To create this dataset, we first perturb a large number of text segments extracted from English language Wikipedia, and then verify these with crowd-sourced annotations. To mitigate label imbalance during annotation, we utilize an iterative model-in-loop strategy. We conduct comprehensive data analyses and create multiple baseline models.

pdf bib
Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation
Faeze Brahman | Baolin Peng | Michel Galley | Sudha Rao | Bill Dolan | Snigdha Chaturvedi | Jianfeng Gao
Findings of the Association for Computational Linguistics: EMNLP 2022

Large pre-trained language models have recently enabled open-ended generation frameworks (e.g., prompt-to-text NLG) to tackle a variety of tasks going beyond the traditional data-to-text generation. While this framework is more general, it is under-specified and often leads to a lack of controllability restricting their real-world usage. We propose a new grounded keys-to-text generation task: the task is to generate a factual description about an entity given a set of guiding keys, and grounding passages. To address this task, we introduce a new dataset, called EntDeGen. Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions. Our EntDescriptor model is equipped with strong rankers to fetch helpful passages and generate entity descriptions. Experimental result shows a good correlation (60.14) between our proposed metric and human judgments of factuality. Our rankers significantly improved the factual correctness of generated descriptions (15.95% and 34.51% relative gains in recall and precision). Finally, our ablation study highlights the benefit of combining keys and groundings.

pdf bib
Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code
Ryan Volum | Sudha Rao | Michael Xu | Gabriel DesGarennes | Chris Brockett | Benjamin Van Durme | Olivia Deng | Akanksha Malhotra | Bill Dolan
Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022)

Non-Player Characters (NPCs) significantly enhance the player experience in many games. Historically, players’ interactions with NPCs have tended to be highly scripted, to be limited to natural language responses to be selected by the player, and to not involve dynamic change in game state. In this work, we demonstrate that use of a few example conversational prompts can power a conversational agent to generate both natural language and novel code. This approach can permit development of NPCs with which players can have grounded conversations that are free-form and less repetitive. We demonstrate our approach using OpenAI Codex (GPT-3 finetuned on GitHub), with Minecraft game development as our test bed. We show that with a few example prompts, a Codex-based agent can generate novel code, hold multi-turn conversations and answer questions about structured data. We evaluate this application using experienced gamers in a Minecraft realm and provide analysis of failure cases and suggest possible directions for solutions.

pdf bib
What Makes Good In-Context Examples for GPT-3?
Jiachang Liu | Dinghan Shen | Yizhe Zhang | Bill Dolan | Lawrence Carin | Weizhu Chen
Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

GPT-3 has attracted lots of attention due to its superior performance across a wide range of NLP tasks, especially with its in-context learning abilities. Despite its success, we found that the empirical results of GPT-3 depend heavily on the choice of in-context examples. In this work, we investigate whether there are more effective strategies for judiciously selecting in-context examples (relative to random sampling) that better leverage GPT-3’s in-context learning capabilities. Inspired by the recent success of leveraging a retrieval module to augment neural networks, we propose to retrieve examples that are semantically-similar to a test query sample to formulate its corresponding prompt. Intuitively, the examples selected with such a strategy may serve as more informative inputs to unleash GPT-3’s power of text generation. We evaluate the proposed approach on several natural language understanding and generation benchmarks, where the retrieval-based prompt selection approach consistently outperforms the random selection baseline. Moreover, it is observed that the sentence encoders fine-tuned on task-related datasets yield even more helpful retrieval results. Notably, significant gains are observed on tasks such as table-to-text generation (44.3% on the ToTTo dataset) and open-domain question answering (45.5% on the NQ dataset).

2021

pdf bib
Contextualized Perturbation for Textual Adversarial Attack
Dianqi Li | Yizhe Zhang | Hao Peng | Liqun Chen | Chris Brockett | Ming-Ting Sun | Bill Dolan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness. Existing techniques of generating such examples are typically driven by local heuristic rules that are agnostic to the context, often resulting in unnatural and ungrammatical outputs. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs through a mask-then-infill procedure. CLARE builds on a pre-trained masked language model and modifies the inputs in a context-aware manner. We propose three contextualized perturbations, Replace, Insert and Merge, that allow for generating outputs of varied lengths. CLARE can flexibly combine these perturbations and apply them at any position in the inputs, and is thus able to attack the victim model more effectively with fewer edits. Extensive experiments and human evaluation demonstrate that CLARE outperforms the baselines in terms of attack success rate, textual similarity, fluency and grammaticality.

pdf bib
Text Editing by Command
Felix Faltings | Michel Galley | Gerold Hintz | Chris Brockett | Chris Quirk | Jianfeng Gao | Bill Dolan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step. The one-shot setting is inadequate, however, when the constraints the user wishes to impose on the generated text are dynamic, especially when authoring longer documents. We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text. To this end, we propose a novel text editing task, and introduce WikiDocEdits, a dataset of single-sentence edits crawled from Wikipedia. We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations. We present empirical and qualitative analyses of this model’s performance.

pdf bib
Contrastive Multi-document Question Generation
Woon Sang Cho | Yizhe Zhang | Sudha Rao | Asli Celikyilmaz | Chenyan Xiong | Jianfeng Gao | Mengdi Wang | Bill Dolan
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Multi-document question generation focuses on generating a question that covers the common aspect of multiple documents. Such a model is useful in generating clarifying options. However, a naive model trained only using the targeted (‘positive’) document set may generate too generic questions that cover a larger scope than delineated by the document set. To address this challenge, we introduce the contrastive learning strategy where given ‘positive’ and ‘negative’ sets of documents, we generate a question that is closely related to the ‘positive’ set but is far away from the ‘negative’ set. This setting allows generated questions to be more specific and related to the target document set. To generate such specific questions, we propose Multi-Source Coordinated Question Generator (MSCQG), a novel framework that includes a supervised learning (SL) stage and a reinforcement learning (RL) stage. In the SL stage, a single-document question generator is trained. In the RL stage, a coordinator model is trained to find optimal attention weights to align multiple single-document generators, by optimizing a reward designed to promote specificity of generated questions. We also develop an effective auxiliary objective, named Set-induced Contrastive Regularization (SCR) that improves the coordinator’s contrastive learning during the RL stage. We show that our model significantly outperforms several strong baselines, as measured by automatic metrics and human evaluation. The source repository is publicly available at ‘www.github.com/woonsangcho/contrast_qgen’.

pdf bib
Automatic Document Sketching: Generating Drafts from Analogous Texts
Zeqiu Wu | Michel Galley | Chris Brockett | Yizhe Zhang | Bill Dolan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks
Angela Lin | Sudha Rao | Asli Celikyilmaz | Elnaz Nouri | Chris Brockett | Debadeepta Dey | Bill Dolan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Many high-level procedural tasks can be decomposed into sequences of instructions that vary in their order and choice of tools. In the cooking domain, the web offers many, partially-overlapping, text and video recipes (i.e. procedures) that describe how to make the same dish (i.e. high-level task). Aligning instructions for the same dish across different sources can yield descriptive visual explanations that are far richer semantically than conventional textual instructions, providing commonsense insight into how real-world procedures are structured. Learning to align these different instruction sets is challenging because: a) different recipes vary in their order of instructions and use of ingredients; and b) video instructions can be noisy and tend to contain far more information than text instructions. To address these challenges, we use an unsupervised alignment algorithm that learns pairwise alignments between instructions of different recipes for the same dish. We then use a graph algorithm to derive a joint alignment between multiple text and multiple video recipes for the same dish. We release the Microsoft Research Multimodal Aligned Recipe Corpus containing ~150K pairwise alignments between recipes across 4262 dishes with rich commonsense information.

pdf bib
MixingBoard: a Knowledgeable Stylized Integrated Text Generation Platform
Xiang Gao | Michel Galley | Bill Dolan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present MixingBoard, a platform for quickly building demos with a focus on knowledge grounded stylized text generation. We unify existing text generation algorithms in a shared codebase and further adapt earlier algorithms for constrained generation. To borrow advantages from different models, we implement strategies for cross-model integration, from the token probability level to the latent space level. An interface to external knowledge is provided via a module that retrieves, on-the-fly, relevant knowledge from passages on the web or a document collection. A user interface for local development, remote webpage access, and a RESTful API are provided to make it simple for users to build their own demos.

pdf bib
DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation
Yizhe Zhang | Siqi Sun | Michel Galley | Yen-Chun Chen | Chris Brockett | Xiang Gao | Jianfeng Gao | Jingjing Liu | Bill Dolan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present a large, tunable neural conversational response generation model, DIALOGPT (dialogue generative pre-trained transformer). Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings. We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems. The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent open-domain dialogue systems.

pdf bib
Dialogue Response Ranking Training with Large-Scale Human Feedback Data
Xiang Gao | Yizhe Zhang | Michel Galley | Chris Brockett | Bill Dolan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing open-domain dialog models are generally trained to minimize the perplexity of target human responses. However, some human replies are more engaging than others, spawning more followup interactions. Current conversational models are increasingly capable of producing turns that are context-relevant, but in order to produce compelling agents, these models need to be able to predict and optimize for turns that are genuinely engaging. We leverage social media feedback data (number of replies and upvotes) to build a large-scale training dataset for feedback prediction. To alleviate possible distortion between the feedback and engagingness, we convert the ranking problem to a comparison of response pairs which involve few confounding factors. We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data and the resulting ranker outperformed several baselines. Particularly, our ranker outperforms the conventional dialog perplexity baseline with a large margin on predicting Reddit feedback. We finally combine the feedback prediction models and a human-like scoring model to rank the machine-generated dialog responses. Crowd-sourced human evaluation shows that our ranking method correlates better with real human preferences than baseline models.

pdf bib
Substance over Style: Document-Level Targeted Content Transfer
Allison Hegel | Sudha Rao | Asli Celikyilmaz | Bill Dolan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing language models excel at writing from scratch, but many real-world scenarios require rewriting an existing document to fit a set of constraints. Although sentence-level rewriting has been fairly well-studied, little work has addressed the challenge of rewriting an entire document coherently. In this work, we introduce the task of document-level targeted content transfer and address it in the recipe domain, with a recipe as the document and a dietary restriction (such as vegan or dairy-free) as the targeted constraint. We propose a novel model for this task based on the generative pre-trained language model (GPT-2) and train on a large number of roughly-aligned recipe pairs. Both automatic and human evaluations show that our model out-performs existing methods by generating coherent and diverse rewrites that obey the constraint while remaining close to the original document. Finally, we analyze our model’s rewrites to assess progress toward the goal of making language generation more attuned to constraints that are substantive rather than stylistic.

pdf bib
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training
Yizhe Zhang | Guoyin Wang | Chunyuan Li | Zhe Gan | Chris Brockett | Bill Dolan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Large-scale pre-trained language models, such as BERT and GPT-2, have achieved excellent performance in language representation learning and free-form text generation. However, these models cannot be directly employed to generate text under specified lexical constraints. To address this challenge, we present POINTER (PrOgressive INsertion-based TransformER), a simple yet novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. This procedure is recursively applied until a sequence is completed. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable. We pre-train our model with the proposed progressive insertion-based objective on a 12GB Wikipedia dataset, and fine-tune it on downstream hard-constrained generation tasks. Non-autoregressive decoding yields a logarithmic time complexity during inference time. Experimental results on both News and Yelp datasets demonstrate that Pointer achieves state-of-the-art performance on constrained text generation. We released the pre-trained models and the source code to facilitate future research.

2019

pdf bib
Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading
Lianhui Qin | Michel Galley | Chris Brockett | Xiaodong Liu | Xiang Gao | Bill Dolan | Yejin Choi | Jianfeng Gao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Although neural conversational models are effective in learning how to produce fluent responses, their primary challenge lies in knowing what to say to make the conversation contentful and non-vacuous. We present a new end-to-end approach to contentful neural conversation that jointly models response generation and on-demand machine reading. The key idea is to provide the conversation model with relevant long-form text on the fly as a source of external knowledge. The model performs QA-style reading comprehension on this text in response to each conversational turn, thereby allowing for more focused integration of external knowledge than has been possible in prior approaches. To support further research on knowledge-grounded conversation, we introduce a new large-scale conversation dataset grounded in external web pages (2.8M turns, 7.4M sentences of grounding). Both human evaluation and automated metrics show that our approach results in more contentful responses compared to a variety of previous methods, improving both the informativeness and diversity of generated output.

pdf bib
Microsoft Icecaps: An Open-Source Toolkit for Conversation Modeling
Vighnesh Leonardo Shiv | Chris Quirk | Anshuman Suri | Xiang Gao | Khuram Shahid | Nithya Govindarajan | Yizhe Zhang | Jianfeng Gao | Michel Galley | Chris Brockett | Tulasi Menon | Bill Dolan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

The Intelligent Conversation Engine: Code and Pre-trained Systems (Microsoft Icecaps) is an upcoming open-source natural language processing repository. Icecaps wraps TensorFlow functionality in a modular component-based architecture, presenting an intuitive and flexible paradigm for constructing sophisticated learning setups. Capabilities include multitask learning between models with shared parameters, upgraded language model decoding features, a range of built-in architectures, and a user-friendly data processing pipeline. The system is targeted toward conversational tasks, exploring diverse response generation, coherence, and knowledge grounding. Icecaps also provides pre-trained conversational models that can be either used directly or loaded for fine-tuning or bootstrapping other models; these models power an online demo of our framework.

pdf bib
Jointly Optimizing Diversity and Relevance in Neural Response Generation
Xiang Gao | Sungjin Lee | Yizhe Zhang | Chris Brockett | Michel Galley | Jianfeng Gao | Bill Dolan
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a SpaceFusion model to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance.

pdf bib
Structuring Latent Spaces for Stylized Response Generation
Xiang Gao | Yizhe Zhang | Sungjin Lee | Michel Galley | Chris Brockett | Jianfeng Gao | Bill Dolan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Generating responses in a targeted style is a useful yet challenging task, especially in the absence of parallel data. With limited data, existing methods tend to generate responses that are either less stylized or less context-relevant. We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space. This structure allows the system to generate stylized relevant responses by sampling in the neighborhood of the conversation model prediction, and continuously control the style level. We demonstrate this method using dialogues from Reddit data and two sets of sentences with distinct styles (arXiv and Sherlock Holmes novels). Automatic and human evaluation show that, without sacrificing appropriateness, the system generates responses of the targeted style and outperforms competitive baselines.

pdf bib
Domain Adaptive Text Style Transfer
Dianqi Li | Yizhe Zhang | Zhe Gan | Yu Cheng | Chris Brockett | Bill Dolan | Ming-Ting Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Text style transfer without parallel data has achieved some practical success. However, in the scenario where less data is available, these methods may yield poor performance. In this paper, we examine domain adaptation for text style transfer to leverage massively available data from other domains. These data may demonstrate domain shift, which impedes the benefits of utilizing such data for training. To address this challenge, we propose simple yet effective domain adaptive text style transfer models, enabling domain-adaptive information exchange. The proposed models presumably learn from the source domain to: (i) distinguish stylized information and generic content information; (ii) maximally preserve content information; and (iii) adaptively transfer the styles in a domain-aware manner. We evaluate the proposed models on two style transfer tasks (sentiment and formality) over multiple target domains where only limited non-parallel data is available. Extensive experiments demonstrate the effectiveness of the proposed model compared to the baselines.

2018

pdf bib
Generating More Interesting Responses in Neural Conversation Models with Distributional Constraints
Ashutosh Baheti | Alan Ritter | Jiwei Li | Bill Dolan
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Neural conversation models tend to generate safe, generic responses for most inputs. This is due to the limitations of likelihood-based decoding objectives in generation tasks with diverse outputs, such as conversation. To address this challenge, we propose a simple yet effective approach for incorporating side information in the form of distributional constraints over the generated responses. We propose two constraints that help generate more content rich responses that are based on a model of syntax and topics (Griffiths et al., 2005) and semantic similarity (Arora et al., 2016). We evaluate our approach against a variety of competitive baselines, using both automatic metrics and human judgments, showing that our proposed approach generates responses that are much less generic without sacrificing plausibility. A working demo of our code can be found at https://github.com/abaheti95/DC-NeuralConversation.

2017

pdf bib
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
Nasrin Mostafazadeh | Chris Brockett | Bill Dolan | Michel Galley | Jianfeng Gao | Georgios Spithourakis | Lucy Vanderwende
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The popularity of image sharing on social media and the engagement it creates between users reflect the important role that visual context plays in everyday conversations. We present a novel task, Image Grounded Conversations (IGC), in which natural-sounding conversations are generated about a shared image. To benchmark progress, we introduce a new multiple reference dataset of crowd-sourced, event-centric conversations on images. IGC falls on the continuum between chit-chat and goal-directed conversation models, where visual grounding constrains the topic of conversation to event-driven utterances. Experiments with models trained on social media data show that the combination of visual and textual context enhances the quality of generated conversational turns. In human evaluation, the gap between human performance and that of both neural and retrieval architectures suggests that multi-modal IGC presents an interesting challenge for dialog research.

pdf bib
Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models
Yi Luan | Chris Brockett | Bill Dolan | Jianfeng Gao | Michel Galley
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Building a persona-based conversation agent is challenging owing to the lack of large amounts of speaker-specific conversation data for model training. This paper addresses the problem by proposing a multi-task learning approach to training neural conversation models that leverages both conversation data across speakers and other types of data pertaining to the speaker and speaker roles to be modeled. Experiments show that our approach leads to significant improvements over baseline model quality, generating responses that capture more precisely speakers’ traits and speaking styles. The model offers the benefits of being algorithmically simple and easy to implement, and not relying on large quantities of data representing specific individual speakers.

2016

pdf bib
A Diversity-Promoting Objective Function for Neural Conversation Models
Jiwei Li | Michel Galley | Chris Brockett | Jianfeng Gao | Bill Dolan
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Persona-Based Neural Conversation Model
Jiwei Li | Michel Galley | Chris Brockett | Georgios Spithourakis | Jianfeng Gao | Bill Dolan
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)
Wei Xu | Chris Callison-Burch | Bill Dolan
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
Alessandro Sordoni | Michel Galley | Michael Auli | Chris Brockett | Yangfeng Ji | Margaret Mitchell | Jian-Yun Nie | Jianfeng Gao | Bill Dolan
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets
Michel Galley | Chris Brockett | Alessandro Sordoni | Yangfeng Ji | Michael Auli | Chris Quirk | Margaret Mitchell | Jianfeng Gao | Bill Dolan
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Extracting Lexically Divergent Paraphrases from Twitter
Wei Xu | Alan Ritter | Chris Callison-Burch | William B. Dolan | Yangfeng Ji
Transactions of the Association for Computational Linguistics, Volume 2

We present MultiP (Multi-instance Learning Paraphrase Model), a new model suited to identify paraphrases within the short messages on Twitter. We jointly model paraphrase relations between word and sentence pairs and assume only sentence-level annotations during learning. Using this principled latent variable model alone, we achieve the performance competitive with a state-of-the-art method which combines a latent space model with a feature-based supervised classifier. Our model also captures lexically divergent paraphrases that differ from yet complement previous methods; combining our model with previous work significantly outperforms the state-of-the-art. In addition, we present a novel annotation methodology that has allowed us to crowdsource a paraphrase corpus from Twitter. We make this new dataset available to the research community.

2013

pdf bib
Lightly Supervised Learning of Procedural Dialog Systems
Svitlana Volkova | Pallavi Choudhury | Chris Quirk | Bill Dolan | Luke Zettlemoyer
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning to Relate Literal and Sentimental Descriptions of Visual Properties
Mark Yatskar | Svitlana Volkova | Asli Celikyilmaz | Bill Dolan | Luke Zettlemoyer
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Paraphrasing for Style
Wei Xu | Alan Ritter | Bill Dolan | Ralph Grishman | Colin Cherry
Proceedings of COLING 2012

pdf bib
CLex: A Lexicon for Exploring Color, Concept and Emotion Associations in Language
Svitlana Volkova | William B. Dolan | Theresa Wilson
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Collecting Highly Parallel Data for Paraphrase Evaluation
David Chen | William Dolan
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Data-Driven Response Generation in Social Media
Alan Ritter | Colin Cherry | William B. Dolan
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Unsupervised Modeling of Twitter Conversations
Alan Ritter | Colin Cherry | Bill Dolan
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
Using Contextual Speller Techniques and Language Modeling for ESL Error Correction
Michael Gamon | Jianfeng Gao | Chris Brockett | Alexandre Klementiev | William B. Dolan | Dmitriy Belenko | Lucy Vanderwende
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
A Web-based English Proofing System for English as a Second Language Users
Xing Yi | Jianfeng Gao | William B. Dolan
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

2007

pdf bib
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
Satoshi Sekine | Kentaro Inui | Ido Dagan | Bill Dolan | Danilo Giampiccolo | Bernardo Magnini
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

pdf bib
The Third PASCAL Recognizing Textual Entailment Challenge
Danilo Giampiccolo | Bernardo Magnini | Ido Dagan | Bill Dolan
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

pdf bib
Correcting ESL Errors Using Phrasal SMT Techniques
Chris Brockett | William B. Dolan | Michael Gamon
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib
Support Vector Machines for Paraphrase Identification and Corpus Construction
Chris Brockett | William B. Dolan
Proceedings of the Third International Workshop on Paraphrasing (IWP2005)

pdf bib
Automatically Constructing a Corpus of Sentential Paraphrases
William B. Dolan | Chris Brockett
Proceedings of the Third International Workshop on Paraphrasing (IWP2005)

pdf bib
Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
Bill Dolan | Ido Dagan
Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment

2004

pdf bib
Monolingual Machine Translation for Paraphrase Generation
Chris Quirk | Chris Brockett | William Dolan
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources
Bill Dolan | Chris Quirk | Chris Brockett
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2001

pdf bib
Overcoming the customization bottleneck using example-based MT
Stephen D. Richardson | William B. Dolan | Arul Menezes | Monica Corston-Oliver
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

pdf bib
Achieving commercial-quality translation with example-based methods
Stephen Richardson | William Dolan | Arul Menezes | Jessie Pinkham
Proceedings of Machine Translation Summit VIII

We describe MSR-MT, a large-scale example-based machine translation system under development for several language pairs. Trained on aligned English-Spanish technical prose, a blind evaluation shows that MSR-MT’s integration of rule-based parsers, example based processing, and statistical techniques produces translations whose quality in this domain exceeds that of uncustomized commercial MT systems.

1999

pdf bib
Less is more: Eliminating index terms from subordinate clauses
Simon H. Corston-Oliver | William B. Dolan
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1998

pdf bib
MindNet: Acquiring and Structuring Semantic Information from Text
Stephen D. Richardson | William B. Dolan | Lucy Vanderwende
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
MindNet: acquiring and structuring semantic information from text
Stephen D. Richardson | William B. Dolan | Lucy Vanderwende
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

1994

pdf bib
Word Sense Ambiguation: Clustering Related Senses
William B. Dolan
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

1993

pdf bib
Combining Dictionary-Based and Example-Based Methods for Natural Language Analysis
Stephen D. Richardson | Lucy Vanderwende | William Dolan
Proceedings of the Fifth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

Search
Co-authors