Chris Brockett


pdf bib
Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code
Ryan Volum | Sudha Rao | Michael Xu | Gabriel DesGarennes | Chris Brockett | Benjamin Van Durme | Olivia Deng | Akanksha Malhotra | Bill Dolan
Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022)

Non-Player Characters (NPCs) significantly enhance the player experience in many games. Historically, players’ interactions with NPCs have tended to be highly scripted, to be limited to natural language responses to be selected by the player, and to not involve dynamic change in game state. In this work, we demonstrate that use of a few example conversational prompts can power a conversational agent to generate both natural language and novel code. This approach can permit development of NPCs with which players can have grounded conversations that are free-form and less repetitive. We demonstrate our approach using OpenAI Codex (GPT-3 finetuned on GitHub), with Minecraft game development as our test bed. We show that with a few example prompts, a Codex-based agent can generate novel code, hold multi-turn conversations and answer questions about structured data. We evaluate this application using experienced gamers in a Minecraft realm and provide analysis of failure cases and suggest possible directions for solutions.

pdf bib
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu | Yizhe Zhang | Chris Brockett | Yi Mao | Zhifang Sui | Weizhu Chen | Bill Dolan
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large pretrained generative models like GPT-3 often suffer from hallucinating non-existent or incorrect content, which undermines their potential merits in real applications. Existing work usually attempts to detect these hallucinations based on a corresponding oracle reference at a sentence or document level. However ground-truth references may not be readily available for many free-form text generation applications, and sentence- or document-level detection may fail to provide the fine-grained signals that would prevent fallacious content in real time. As a first step to addressing these issues, we propose a novel token-level, reference-free hallucination detection task and an associated annotated dataset named HaDeS (HAllucination DEtection dataSet). To create this dataset, we first perturb a large number of text segments extracted from English language Wikipedia, and then verify these with crowd-sourced annotations. To mitigate label imbalance during annotation, we utilize an iterative model-in-loop strategy. We conduct comprehensive data analyses and create multiple baseline models.


pdf bib
Automatic Document Sketching: Generating Drafts from Analogous Texts
Zeqiu Wu | Michel Galley | Chris Brockett | Yizhe Zhang | Bill Dolan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Proceedings of the 1st Workshop on NLP for Positive Impact
Anjalie Field | Shrimai Prabhumoye | Maarten Sap | Zhijing Jin | Jieyu Zhao | Chris Brockett
Proceedings of the 1st Workshop on NLP for Positive Impact

pdf bib
Contextualized Perturbation for Textual Adversarial Attack
Dianqi Li | Yizhe Zhang | Hao Peng | Liqun Chen | Chris Brockett | Ming-Ting Sun | Bill Dolan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness. Existing techniques of generating such examples are typically driven by local heuristic rules that are agnostic to the context, often resulting in unnatural and ungrammatical outputs. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs through a mask-then-infill procedure. CLARE builds on a pre-trained masked language model and modifies the inputs in a context-aware manner. We propose three contextualized perturbations, Replace, Insert and Merge, that allow for generating outputs of varied lengths. CLARE can flexibly combine these perturbations and apply them at any position in the inputs, and is thus able to attack the victim model more effectively with fewer edits. Extensive experiments and human evaluation demonstrate that CLARE outperforms the baselines in terms of attack success rate, textual similarity, fluency and grammaticality.

pdf bib
Text Editing by Command
Felix Faltings | Michel Galley | Gerold Hintz | Chris Brockett | Chris Quirk | Jianfeng Gao | Bill Dolan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step. The one-shot setting is inadequate, however, when the constraints the user wishes to impose on the generated text are dynamic, especially when authoring longer documents. We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text. To this end, we propose a novel text editing task, and introduce WikiDocEdits, a dataset of single-sentence edits crawled from Wikipedia. We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations. We present empirical and qualitative analyses of this model’s performance.


pdf bib
Dialogue Response Ranking Training with Large-Scale Human Feedback Data
Xiang Gao | Yizhe Zhang | Michel Galley | Chris Brockett | Bill Dolan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Existing open-domain dialog models are generally trained to minimize the perplexity of target human responses. However, some human replies are more engaging than others, spawning more followup interactions. Current conversational models are increasingly capable of producing turns that are context-relevant, but in order to produce compelling agents, these models need to be able to predict and optimize for turns that are genuinely engaging. We leverage social media feedback data (number of replies and upvotes) to build a large-scale training dataset for feedback prediction. To alleviate possible distortion between the feedback and engagingness, we convert the ranking problem to a comparison of response pairs which involve few confounding factors. We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data and the resulting ranker outperformed several baselines. Particularly, our ranker outperforms the conventional dialog perplexity baseline with a large margin on predicting Reddit feedback. We finally combine the feedback prediction models and a human-like scoring model to rank the machine-generated dialog responses. Crowd-sourced human evaluation shows that our ranking method correlates better with real human preferences than baseline models.

pdf bib
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training
Yizhe Zhang | Guoyin Wang | Chunyuan Li | Zhe Gan | Chris Brockett | Bill Dolan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Large-scale pre-trained language models, such as BERT and GPT-2, have achieved excellent performance in language representation learning and free-form text generation. However, these models cannot be directly employed to generate text under specified lexical constraints. To address this challenge, we present POINTER (PrOgressive INsertion-based TransformER), a simple yet novel insertion-based approach for hard-constrained text generation. The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner. This procedure is recursively applied until a sequence is completed. The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable. We pre-train our model with the proposed progressive insertion-based objective on a 12GB Wikipedia dataset, and fine-tune it on downstream hard-constrained generation tasks. Non-autoregressive decoding yields a logarithmic time complexity during inference time. Experimental results on both News and Yelp datasets demonstrate that Pointer achieves state-of-the-art performance on constrained text generation. We released the pre-trained models and the source code to facilitate future research.

pdf bib
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks
Angela Lin | Sudha Rao | Asli Celikyilmaz | Elnaz Nouri | Chris Brockett | Debadeepta Dey | Bill Dolan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Many high-level procedural tasks can be decomposed into sequences of instructions that vary in their order and choice of tools. In the cooking domain, the web offers many, partially-overlapping, text and video recipes (i.e. procedures) that describe how to make the same dish (i.e. high-level task). Aligning instructions for the same dish across different sources can yield descriptive visual explanations that are far richer semantically than conventional textual instructions, providing commonsense insight into how real-world procedures are structured. Learning to align these different instruction sets is challenging because: a) different recipes vary in their order of instructions and use of ingredients; and b) video instructions can be noisy and tend to contain far more information than text instructions. To address these challenges, we use an unsupervised alignment algorithm that learns pairwise alignments between instructions of different recipes for the same dish. We then use a graph algorithm to derive a joint alignment between multiple text and multiple video recipes for the same dish. We release the Microsoft Research Multimodal Aligned Recipe Corpus containing ~150K pairwise alignments between recipes across 4262 dishes with rich commonsense information.

pdf bib
DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation
Yizhe Zhang | Siqi Sun | Michel Galley | Yen-Chun Chen | Chris Brockett | Xiang Gao | Jianfeng Gao | Jingjing Liu | Bill Dolan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present a large, tunable neural conversational response generation model, DIALOGPT (dialogue generative pre-trained transformer). Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings. We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems. The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent open-domain dialogue systems.


pdf bib
Jointly Optimizing Diversity and Relevance in Neural Response Generation
Xiang Gao | Sungjin Lee | Yizhe Zhang | Chris Brockett | Michel Galley | Jianfeng Gao | Bill Dolan
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a SpaceFusion model to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance.

pdf bib
Structuring Latent Spaces for Stylized Response Generation
Xiang Gao | Yizhe Zhang | Sungjin Lee | Michel Galley | Chris Brockett | Jianfeng Gao | Bill Dolan
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Generating responses in a targeted style is a useful yet challenging task, especially in the absence of parallel data. With limited data, existing methods tend to generate responses that are either less stylized or less context-relevant. We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space. This structure allows the system to generate stylized relevant responses by sampling in the neighborhood of the conversation model prediction, and continuously control the style level. We demonstrate this method using dialogues from Reddit data and two sets of sentences with distinct styles (arXiv and Sherlock Holmes novels). Automatic and human evaluation show that, without sacrificing appropriateness, the system generates responses of the targeted style and outperforms competitive baselines.

pdf bib
Domain Adaptive Text Style Transfer
Dianqi Li | Yizhe Zhang | Zhe Gan | Yu Cheng | Chris Brockett | Bill Dolan | Ming-Ting Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Text style transfer without parallel data has achieved some practical success. However, in the scenario where less data is available, these methods may yield poor performance. In this paper, we examine domain adaptation for text style transfer to leverage massively available data from other domains. These data may demonstrate domain shift, which impedes the benefits of utilizing such data for training. To address this challenge, we propose simple yet effective domain adaptive text style transfer models, enabling domain-adaptive information exchange. The proposed models presumably learn from the source domain to: (i) distinguish stylized information and generic content information; (ii) maximally preserve content information; and (iii) adaptively transfer the styles in a domain-aware manner. We evaluate the proposed models on two style transfer tasks (sentiment and formality) over multiple target domains where only limited non-parallel data is available. Extensive experiments demonstrate the effectiveness of the proposed model compared to the baselines.

pdf bib
Generating a Common Question from Multiple Documents using Multi-source Encoder-Decoder Models
Woon Sang Cho | Yizhe Zhang | Sudha Rao | Chris Brockett | Sungjin Lee
Proceedings of the 3rd Workshop on Neural Generation and Translation

Ambiguous user queries in search engines result in the retrieval of documents that often span multiple topics. One potential solution is for the search engine to generate multiple refined queries, each of which relates to a subset of the documents spanning the same topic. A preliminary step towards this goal is to generate a question that captures common concepts of multiple documents. We propose a new task of generating common question from multiple documents and present simple variant of an existing multi-source encoder-decoder framework, called the Multi-Source Question Generator (MSQG). We first train an RNN-based single encoder-decoder generator from (single document, question) pairs. At test time, given multiple documents, the Distribute step of our MSQG model predicts target word distributions for each document using the trained model. The Aggregate step aggregates these distributions to generate a common question. This simple yet effective strategy significantly outperforms several existing baseline models applied to the new task when evaluated using automated metrics and human judgments on the MS-MARCO-QA dataset.

pdf bib
Towards Coherent and Cohesive Long-form Text Generation
Woon Sang Cho | Pengchuan Zhang | Yizhe Zhang | Xiujun Li | Michel Galley | Chris Brockett | Mengdi Wang | Jianfeng Gao
Proceedings of the First Workshop on Narrative Understanding

Generating coherent and cohesive long-form texts is a challenging task. Previous works relied on large amounts of human-generated texts to train neural language models. However, few attempted to explicitly improve neural language models from the perspectives of coherence and cohesion. In this work, we propose a new neural language model that is equipped with two neural discriminators which provide feedback signals at the levels of sentence (cohesion) and paragraph (coherence). Our model is trained using a simple yet efficient variant of policy gradient, called ‘negative-critical sequence training’, which is proposed to eliminate the need of training a separate critic for estimating ‘baseline’. Results demonstrate the effectiveness of our approach, showing improvements over the strong baseline – recurrent attention-based bidirectional MLE-trained neural language model.

pdf bib
Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading
Lianhui Qin | Michel Galley | Chris Brockett | Xiaodong Liu | Xiang Gao | Bill Dolan | Yejin Choi | Jianfeng Gao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Although neural conversational models are effective in learning how to produce fluent responses, their primary challenge lies in knowing what to say to make the conversation contentful and non-vacuous. We present a new end-to-end approach to contentful neural conversation that jointly models response generation and on-demand machine reading. The key idea is to provide the conversation model with relevant long-form text on the fly as a source of external knowledge. The model performs QA-style reading comprehension on this text in response to each conversational turn, thereby allowing for more focused integration of external knowledge than has been possible in prior approaches. To support further research on knowledge-grounded conversation, we introduce a new large-scale conversation dataset grounded in external web pages (2.8M turns, 7.4M sentences of grounding). Both human evaluation and automated metrics show that our approach results in more contentful responses compared to a variety of previous methods, improving both the informativeness and diversity of generated output.

pdf bib
Microsoft Icecaps: An Open-Source Toolkit for Conversation Modeling
Vighnesh Leonardo Shiv | Chris Quirk | Anshuman Suri | Xiang Gao | Khuram Shahid | Nithya Govindarajan | Yizhe Zhang | Jianfeng Gao | Michel Galley | Chris Brockett | Tulasi Menon | Bill Dolan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

The Intelligent Conversation Engine: Code and Pre-trained Systems (Microsoft Icecaps) is an upcoming open-source natural language processing repository. Icecaps wraps TensorFlow functionality in a modular component-based architecture, presenting an intuitive and flexible paradigm for constructing sophisticated learning setups. Capabilities include multitask learning between models with shared parameters, upgraded language model decoding features, a range of built-in architectures, and a user-friendly data processing pipeline. The system is targeted toward conversational tasks, exploring diverse response generation, coherence, and knowledge grounding. Icecaps also provides pre-trained conversational models that can be either used directly or loaded for fine-tuning or bootstrapping other models; these models power an online demo of our framework.


pdf bib
Steering Output Style and Topic in Neural Response Generation
Di Wang | Nebojsa Jojic | Chris Brockett | Eric Nyberg
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose simple and flexible training and decoding methods for influencing output style and topic in neural encoder-decoder based language generation. This capability is desirable in a variety of applications, including conversational systems, where successful agents need to produce language in a specific style and generate responses steered by a human puppeteer or external knowledge. We decompose the neural generation process into empirically easier sub-problems: a faithfulness model and a decoding method based on selective-sampling. We also describe training and sampling algorithms that bias the generation process with a specific language style restriction, or a topic restriction. Human evaluation results show that our proposed methods are able to to restrict style and topic without degrading output quality in conversational tasks.

pdf bib
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
Nasrin Mostafazadeh | Chris Brockett | Bill Dolan | Michel Galley | Jianfeng Gao | Georgios Spithourakis | Lucy Vanderwende
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The popularity of image sharing on social media and the engagement it creates between users reflect the important role that visual context plays in everyday conversations. We present a novel task, Image Grounded Conversations (IGC), in which natural-sounding conversations are generated about a shared image. To benchmark progress, we introduce a new multiple reference dataset of crowd-sourced, event-centric conversations on images. IGC falls on the continuum between chit-chat and goal-directed conversation models, where visual grounding constrains the topic of conversation to event-driven utterances. Experiments with models trained on social media data show that the combination of visual and textual context enhances the quality of generated conversational turns. In human evaluation, the gap between human performance and that of both neural and retrieval architectures suggests that multi-modal IGC presents an interesting challenge for dialog research.

pdf bib
Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models
Yi Luan | Chris Brockett | Bill Dolan | Jianfeng Gao | Michel Galley
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Building a persona-based conversation agent is challenging owing to the lack of large amounts of speaker-specific conversation data for model training. This paper addresses the problem by proposing a multi-task learning approach to training neural conversation models that leverages both conversation data across speakers and other types of data pertaining to the speaker and speaker roles to be modeled. Experiments show that our approach leads to significant improvements over baseline model quality, generating responses that capture more precisely speakers’ traits and speaking styles. The model offers the benefits of being algorithmically simple and easy to implement, and not relying on large quantities of data representing specific individual speakers.


pdf bib
A Persona-Based Neural Conversation Model
Jiwei Li | Michel Galley | Chris Brockett | Georgios Spithourakis | Jianfeng Gao | Bill Dolan
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Dataset and Evaluation Metrics for Abstractive Compression of Sentences and Short Paragraphs
Kristina Toutanova | Chris Brockett | Ke M. Tran | Saleema Amershi
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Diversity-Promoting Objective Function for Neural Conversation Models
Jiwei Li | Michel Galley | Chris Brockett | Jianfeng Gao | Bill Dolan
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies


pdf bib
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
Alessandro Sordoni | Michel Galley | Michael Auli | Chris Brockett | Yangfeng Ji | Margaret Mitchell | Jian-Yun Nie | Jianfeng Gao | Bill Dolan
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets
Michel Galley | Chris Brockett | Alessandro Sordoni | Yangfeng Ji | Michael Auli | Chris Quirk | Margaret Mitchell | Jianfeng Gao | Bill Dolan
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)


pdf bib
Diverse Words, Shared Meanings: Statistical Machine Translation for Paraphrase, Grounding, and Intent
Chris Brockett
Proceedings of the Australasian Language Technology Association Workshop 2012


pdf bib
Hitting the Right Paraphrases in Good Time
Stanley Kok | Chris Brockett
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics


pdf bib
User Input and Interactions on Microsoft Research ESL Assistant
Claudia Leacock | Michael Gamon | Chris Brockett
Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications


pdf bib
Using Contextual Speller Techniques and Language Modeling for ESL Error Correction
Michael Gamon | Jianfeng Gao | Chris Brockett | Alexandre Klementiev | William B. Dolan | Dmitriy Belenko | Lucy Vanderwende
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I


pdf bib
Correcting ESL Errors Using Phrasal SMT Techniques
Chris Brockett | William B. Dolan | Michael Gamon
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics


pdf bib
Support Vector Machines for Paraphrase Identification and Corpus Construction
Chris Brockett | William B. Dolan
Proceedings of the Third International Workshop on Paraphrasing (IWP2005)

pdf bib
Automatically Constructing a Corpus of Sentential Paraphrases
William B. Dolan | Chris Brockett
Proceedings of the Third International Workshop on Paraphrasing (IWP2005)


pdf bib
Monolingual Machine Translation for Paraphrase Generation
Chris Quirk | Chris Brockett | William Dolan
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources
Bill Dolan | Chris Quirk | Chris Brockett
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics


pdf bib
English-Japanese Example-Based Machine Translation Using Abstract Linguistic Representations
Chris Brockett | Takako Aikawa | Anthony Aue | Arul Menezes | Chris Quirk | Hisami Suzuki
COLING-02: Machine Translation in Asia


pdf bib
A Machine Learning Approach to the Automatic Evaluation of Machine Translation
Simon Corston-Oliver | Michael Gamon | Chris Brockett
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics


pdf bib
Robust Segmentation of Japanese Text into a Lattice for Parsing
Gary Kacmarcik | Chris Brockett | Hisami Suzuki
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
Using a Broad-Coverage Parser for Word-Breaking in Japanese
Hisami Suzuki | Chris Brockett | Gary Kaemareik
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics