Ximing Lu


pdf bib
Generated Knowledge Prompting for Commonsense Reasoning
Jiacheng Liu | Alisa Liu | Ximing Lu | Sean Welleck | Peter West | Ronan Le Bras | Yejin Choi | Hannaneh Hajishirzi
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

It remains an open question whether incorporating external knowledge benefits commonsense reasoning while maintaining the flexibility of pretrained sequence models. To investigate this question, we develop generated knowledge prompting, which consists of generating knowledge from a language model, then providing the knowledge as additional input when answering a question. Our method does not require task-specific supervision for knowledge integration, or access to a structured knowledge base, yet it improves performance of large-scale, state-of-the-art models on four commonsense reasoning tasks, achieving state-of-the-art results on numerical commonsense (NumerSense), general commonsense (CommonsenseQA 2.0), and scientific commonsense (QASC) benchmarks. Generated knowledge prompting highlights large-scale language models as flexible sources of external knowledge for improving commonsense reasoning.Our code is available at github.com/liujch1998/GKP


pdf bib
On-the-Fly Attention Modulation for Neural Generation
Yue Dong | Chandra Bhagavatula | Ximing Lu | Jena D. Hwang | Antoine Bosselut | Jackie Chi Kit Cheung | Yejin Choi
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints
Ximing Lu | Peter West | Rowan Zellers | Ronan Le Bras | Chandra Bhagavatula | Yejin Choi
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Conditional text generation often requires lexical constraints, i.e., which words should or shouldn’t be included in the output text. While the dominant recipe for conditional text generation has been large-scale pretrained language models that are finetuned on the task-specific training data, such models do not learn to follow the underlying constraints reliably, even when supervised with large amounts of task-specific examples. We propose NeuroLogic Decoding, a simple yet effective algorithm that enables neural language models – supervised or not – to generate fluent text while satisfying complex lexical constraints. Our approach is powerful yet efficient. It handles any set of lexical constraints that is expressible under predicate logic, while its asymptotic runtime is equivalent to conventional beam search. Empirical results on four benchmarks show that NeuroLogic Decoding outperforms previous approaches, including algorithms that handle a subset of our constraints. Moreover, we find that unsupervised models with NeuroLogic Decoding often outperform supervised models with conventional decoding, even when the latter is based on considerably larger networks. Our results suggest the limit of large-scale neural networks for fine-grained controllable generation and the promise of inference-time algorithms.

pdf bib
Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf Language Models
Peter West | Ximing Lu | Ari Holtzman | Chandra Bhagavatula | Jena D. Hwang | Yejin Choi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Publicly available, large pretrained Language Models (LMs) generate text with remarkable quality, but only sequentially from left to right. As a result, they are not immediately applicable to generation tasks that break the unidirectional assumption, such as paraphrasing or text-infilling, necessitating task-specific supervision. In this paper, we present Reflective Decoding, a novel unsupervised algorithm that allows for direct application of unidirectional LMs to non-sequential tasks. Our 2-step approach requires no supervision or even parallel corpora, only two off-the-shelf pretrained LMs in opposite directions: forward and backward. First, in the contextualization step, we use LMs to generate ensembles of past and future contexts which collectively capture the input (e.g. the source sentence for paraphrasing). Second, in the reflection step, we condition on these “context ensembles”, generating outputs that are compatible with them. Comprehensive empirical results demonstrate that Reflective Decoding outperforms strong unsupervised baselines on both paraphrasing and abductive text infilling, significantly narrowing the gap between unsupervised and supervised methods. Reflective Decoding surpasses multiple supervised baselines on various metrics including human evaluation.

pdf bib
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
Alisa Liu | Maarten Sap | Ximing Lu | Swabha Swayamdipta | Chandra Bhagavatula | Noah A. Smith | Yejin Choi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Despite recent advances in natural language generation, it remains challenging to control attributes of generated text. We propose DExperts: Decoding-time Experts, a decoding-time method for controlled text generation that combines a pretrained language model with “expert” LMs and/or “anti-expert” LMs in a product of experts. Intuitively, under the ensemble, tokens only get high probability if they are considered likely by the experts, and unlikely by the anti-experts. We apply DExperts to language detoxification and sentiment-controlled generation, where we outperform existing controllable generation methods on both automatic and human evaluations. Moreover, because DExperts operates only on the output of the pretrained LM, it is effective with (anti-)experts of smaller size, including when operating on GPT-3. Our work highlights the promise of tuning small LMs on text with (un)desirable attributes for efficient decoding-time steering.