Michael White

Also published as: Mike White


2024

pdf bib
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
Ziru Chen | Michael White | Ray Mooney | Ali Payani | Yu Su | Huan Sun
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we examine how large language models (LLMs) solve multi-step problems under a language agent framework with three components: a generator, a discriminator, and a planning method. We investigate the practical utility of two advanced planning methods, iterative correction and tree search. We present a comprehensive analysis of how discrimination accuracy affects the overall performance of agents when using these two methods or a simpler method, re-ranking. Experiments on two tasks, text-to-SQL parsing and mathematical reasoning, show that: (1) advanced planning methods demand discriminators with at least 90% accuracy to achieve significant improvements over re-ranking; (2) current LLMs’ discrimination abilities have not met the needs of advanced planning methods to achieve such improvements; (3) with LLM-based discriminators, advanced planning methods may not adequately balance accuracy and efficiency. For example, compared to the other two methods, tree search is at least 10–20 times slower but leads to negligible performance gains, which hinders its real-world applications.

pdf bib
Insights of a Usability Study for KBQA Interactive Semantic Parsing: Generation Yields Benefits over Templates but External Validity Remains Challenging
Ashley Lewis | Lingbo Mo | Marie-Catherine de Marneffe | Huan Sun | Michael White
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024

We present our findings from a usability study of an interactive semantic parsing system for knowledge based question answering (KBQA). The system is designed to help users access information within a knowledge base without having to know its query language. The system translates the user’s question into the query language, retrieves an answer, then presents an English explanation of the process so that the user can make corrections if necessary. To our knowledge, our work is the most thorough usability study conducted for such a system and the only one that uses crowdworkers as participants to verify that the system is usable for average users. Our crowdworkers participate in KBQA dialogues using 4 versions of a system based on the framework by Mo et al. (2022) and answer surveys about their experiences. Some key takeaways from this work are: 1) we provide evidence for the benefits of interactivity in semantic parsing with human users and using generated questions in lieu of templated representations, 2) we identify limitations of simulations and provide contrasting evidence from actual system use, and 3) we provide an examination of crowdsourcing methodology, in particular the trade-offs of using crowdworkers vs. a specially trained group of evaluators.

pdf bib
OSU CompLing at the GEM’24 Data-to-Text Task
Alyssa Allen | Ashley Lewis | Yi-Chien Lin | Tomiris Kaumenova | Michael White
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges

This paper details experiments conducted for completing the GEM 2024 Data-to-Text task for a WebNLG dataset (Gardent et al., 2017). We show that model performance varies greatly across English, Spanish, Chinese, and Russian. Data filtering was done with automatic model judgments via error detection, which performs differently per language. We report English and Spanish dev set results for a data filtering and knowledge distillation approach to generating natural language outputs for sets of triples across a variety of domains. Specifically, we compare three generation conditions: 1) few-shot prompting with ChatGPT (GPT4), 2) fine-tuning LLama2 on the unfiltered dataset, and 3) fine-tuning Llama2 on a filtered version of the dataset. Russian and Chinese efforts did not result in submissions due to inconsistent or incoherent translations being produced in either the data synthesis or final generation stages. We provide details on these shortcomings but largely focus on Spanish and English efforts that align with our task submissions. We ultimately submitted outputs in English and Spanish that were generated using a version of Llama2 fine-tuned on a filtered dataset.

2023

pdf bib
Text-to-SQL Error Correction with Language Models of Code
Ziru Chen | Shijie Chen | Michael White | Raymond Mooney | Ali Payani | Jayanth Srinivasa | Yu Su | Huan Sun
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Despite recent progress in text-to-SQL parsing, current semantic parsers are still not accurate enough for practical use. In this paper, we investigate how to build automatic text-to-SQL error correction models. Noticing that token-level edits are out of context and sometimes ambiguous, we propose building clause-level edit models instead. Besides, while most language models of code are not specifically pre-trained for SQL, they know common data structures and their operations in programming languages such as Python. Thus, we propose a novel representation for SQL queries and their edits that adheres more closely to the pre-training corpora of language models of code. Our error correction model improves the exact set match accuracy of different parsers by 2.4-6.5 and obtains up to 4.3 point absolute improvement over two strong baselines.

pdf bib
Bootstrapping a Conversational Guide for Colonoscopy Prep
Pulkit Arya | Madeleine Bloomquist | Subhankar Chakraborty | Andrew Perrault | William Schuler | Eric Fosler-Lussier | Michael White
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Creating conversational systems for niche domains is a challenging task, further exacerbated by a lack of quality datasets. We explore the construction of safer conversational systems for guiding patients in preparing for colonoscopies. This has required a data generation pipeline to generate a minimum viable dataset to bootstrap a semantic parser, augmented by automatic paraphrasing. Our study suggests large language models (e.g., GPT-3.5 and GPT-4) are a viable alternative to crowd sourced paraphrasing, but conversational systems that rely upon language models’ ability to do temporal reasoning struggle to provide accurate responses. A neural-symbolic system that performs temporal reasoning on an intermediate representation of user queries shows promising results compared to an end-to-end dialogue system, improving the number of correct responses while vastly reducing the number of incorrect or misleading ones.

pdf bib
Mitigating Harms of LLMs via Knowledge Distillation for a Virtual Museum Tour Guide
Ashley Lewis | Michael White
Proceedings of the 1st Workshop on Taming Large Language Models: Controllability in the era of Interactive Assistants!

LLMs are known to be very powerful, exhibiting both great benefits and great risk. We seek to leverage the benefits, in particular the ability to be fluent, conversational dialogue agents, while minimizing the risks, such as hallucination and toxic content. In this work we use knowledge distillation to create a virtual museum tour guide dialogue agent, employing ChatGPT as a teacher model for a smaller student model, T5-large. We find the T5 model shows competitive performance, significantly reduces instances of hallucination, and shows promise for reducing toxic content.

2022

pdf bib
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Sebastian Gehrmann | Abhik Bhattacharjee | Abinaya Mahendiran | Alex Wang | Alexandros Papangelis | Aman Madaan | Angelina Mcmillan-major | Anna Shvets | Ashish Upadhyay | Bernd Bohnet | Bingsheng Yao | Bryan Wilie | Chandra Bhagavatula | Chaobin You | Craig Thomson | Cristina Garbacea | Dakuo Wang | Daniel Deutsch | Deyi Xiong | Di Jin | Dimitra Gkatzia | Dragomir Radev | Elizabeth Clark | Esin Durmus | Faisal Ladhak | Filip Ginter | Genta Indra Winata | Hendrik Strobelt | Hiroaki Hayashi | Jekaterina Novikova | Jenna Kanerva | Jenny Chim | Jiawei Zhou | Jordan Clive | Joshua Maynez | João Sedoc | Juraj Juraska | Kaustubh Dhole | Khyathi Raghavi Chandu | Laura Perez Beltrachini | Leonardo F . R. Ribeiro | Lewis Tunstall | Li Zhang | Mahim Pushkarna | Mathias Creutz | Michael White | Mihir Sanjay Kale | Moussa Kamal Eddine | Nico Daheim | Nishant Subramani | Ondrej Dusek | Paul Pu Liang | Pawan Sasanka Ammanamanchi | Qi Zhu | Ratish Puduppully | Reno Kriz | Rifat Shahriyar | Ronald Cardenas | Saad Mahamood | Salomey Osei | Samuel Cahyawijaya | Sanja Štajner | Sebastien Montella | Shailza Jolly | Simon Mille | Tahmid Hasan | Tianhao Shen | Tosin Adewumi | Vikas Raunak | Vipul Raheja | Vitaly Nikolaev | Vivian Tsai | Yacine Jernite | Ying Xu | Yisi Sang | Yixin Liu | Yufang Hou
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Evaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.

pdf bib
Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction
Lingbo Mo | Ashley Lewis | Huan Sun | Michael White
Findings of the Association for Computational Linguistics: ACL 2022

Existing studies on semantic parsing focus on mapping a natural-language utterance to a logical form (LF) in one turn. However, because natural language may contain ambiguity and variability, this is a difficult challenge. In this work, we investigate an interactive semantic parsing framework that explains the predicted LF step by step in natural language and enables the user to make corrections through natural-language feedback for individual steps. We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework, aiming to increase the transparency of the parsing process and help the user trust the final answer. We construct INSPIRED, a crowdsourced dialogue dataset derived from the ComplexWebQuestions dataset. Our experiments show that this framework has the potential to greatly improve overall parse accuracy. Furthermore, we develop a pipeline for dialogue simulation to evaluate our framework w.r.t. a variety of state-of-the-art KBQA models without further crowdsourcing effort. The results demonstrate that our framework promises to be effective across such models.

pdf bib
Generating Discourse Connectives with Pre-trained Language Models: Conditioning on Discourse Relations Helps Reconstruct the PDTB
Symon Stevens-Guille | Aleksandre Maskharashvili | Xintong Li | Michael White
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

We report results of experiments using BART (Lewis et al., 2019) and the Penn Discourse Tree Bank (Webber et al., 2019) (PDTB) to generate texts with correctly realized discourse relations. We address a question left open by previous research (Yung et al., 2021; Ko and Li, 2020) concerning whether conditioning the model on the intended discourse relation—which corresponds to adding explicit discourse relation information into the input to the model—improves its performance. Our results suggest that including discourse relation information in the input of the model significantly improves the consistency with which it produces a correctly realized discourse relation in the output. We compare our models’ performance to known results concerning the discourse structures found in written text and their possible explanations in terms of discourse interpretation strategies hypothesized in the psycholinguistics literature. Our findings suggest that natural language generation models based on current pre-trained Transformers will benefit from infusion with discourse level information if they aim to construct discourses with the intended relations.

2021

pdf bib
Building Adaptive Acceptability Classifiers for Neural NLG
Soumya Batra | Shashank Jain | Peyman Heidari | Ankit Arun | Catharine Youngs | Xintong Li | Pinar Donmez | Shawn Mei | Shiunzu Kuo | Vikas Bhardwaj | Anuj Kumar | Michael White
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We propose a novel framework to train models to classify acceptability of responses generated by natural language generation (NLG) models, improving upon existing sentence transformation and model-based approaches. An NLG response is considered acceptable if it is both semantically correct and grammatical. We don’t make use of any human references making the classifiers suitable for runtime deployment. Training data for the classifiers is obtained using a 2-stage approach of first generating synthetic data using a combination of existing and new model-based approaches followed by a novel validation framework to filter and sort the synthetic data into acceptable and unacceptable classes. Our 2-stage approach adapts to a wide range of data representations and does not require additional data beyond what the NLG models are trained on. It is also independent of the underlying NLG model architecture, and is able to generate more realistic samples close to the distribution of the NLG model-generated responses. We present results on 5 datasets (WebNLG, Cleaned E2E, ViGGO, Alarm, and Weather) with varying data representations. We compare our framework with existing techniques that involve synthetic data generation using simple sentence transformations and/or model-based techniques, and show that building acceptability classifiers using data that resembles the generation model outputs followed by a validation framework outperforms the existing techniques, achieving state-of-the-art results. We also show that our techniques can be used in few-shot settings using self-training.

pdf bib
Structure-to-Text Generation with Self-Training, Acceptability Classifiers and Context-Conditioning for the GEM Shared Task
Shreyan Bakshi | Soumya Batra | Peyman Heidari | Ankit Arun | Shashank Jain | Michael White
Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)

We explore the use of self-training and acceptability classifiers with pre-trained models for natural language generation in structure-to-text settings using three GEM datasets (E2E, WebNLG-en, Schema-Guided Dialog). With the Schema-Guided Dialog dataset, we also experiment with including multiple turns of context in the input. We find that self-training with reconstruction matching along with acceptability classifier filtering can improve semantic correctness, though gains are limited in the full-data setting. With context-conditioning, we find that including multiple turns in the context encourages the model to align with the user’s word and phrasing choices as well as to generate more self-consistent responses. In future versions of the GEM challenge, we encourage the inclusion of few-shot tracks to encourage research on data efficiency.

pdf bib
Neural Methodius Revisited: Do Discourse Relations Help with Pre-Trained Models Too?
Aleksandre Maskharashvili | Symon Stevens-Guille | Xintong Li | Michael White
Proceedings of the 14th International Conference on Natural Language Generation

Recent developments in natural language generation (NLG) have bolstered arguments in favor of re-introducing explicit coding of discourse relations in the input to neural models. In the Methodius corpus, a meaning representation (MR) is hierarchically structured and includes discourse relations. Meanwhile pre-trained language models have been shown to implicitly encode rich linguistic knowledge which provides an excellent resource for NLG. By virtue of synthesizing these lines of research, we conduct extensive experiments on the benefits of using pre-trained models and discourse relation information in MRs, focusing on the improvement of discourse coherence and correctness. We redesign the Methodius corpus; we also construct another Methodius corpus in which MRs are not hierarchically structured but flat. We report experiments on different versions of the corpora, which probe when, where, and how pre-trained models benefit from MRs with discourse relation information in them. We conclude that discourse relations significantly improve NLG when data is limited.

pdf bib
Self-Training for Compositional Neural NLG in Task-Oriented Dialogue
Xintong Li | Symon Stevens-Guille | Aleksandre Maskharashvili | Michael White
Proceedings of the 14th International Conference on Natural Language Generation

Neural approaches to natural language generation in task-oriented dialogue have typically required large amounts of annotated training data to achieve satisfactory performance, especially when generating from compositional inputs. To address this issue, we show that self-training enhanced with constrained decoding yields large gains in data efficiency on a conversational weather dataset that employs compositional meaning representations. In particular, our experiments indicate that self-training with constrained decoding can enable sequence-to-sequence models to achieve satisfactory quality using vanilla decoding with five to ten times less data than with ordinary supervised baseline; moreover, by leveraging pretrained models, data efficiency can be increased further to fifty times. We confirm the main automatic results with human evaluations and show that they extend to an enhanced, compositional version of the E2E dataset. The end result is an approach that makes it possible to achieve acceptable performance on compositional NLG tasks using hundreds rather than tens of thousands of training samples.

pdf bib
Getting to Production with Few-shot Natural Language Generation Models
Peyman Heidari | Arash Einolghozati | Shashank Jain | Soumya Batra | Lee Callender | Ankit Arun | Shawn Mei | Sonal Gupta | Pinar Donmez | Vikas Bhardwaj | Anuj Kumar | Michael White
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

In this paper, we study the utilization of pre-trained language models to enable few-shotNatural Language Generation (NLG) in task-oriented dialog systems. We introduce a system consisting of iterative self-training and an extensible mini-template framework that textualizes the structured input data into semi-natural text to fully take advantage of pre-trained language models. We compare var-ious representations of NLG models’ input and output and show that transforming the input and output to be similar to what the language model has seen before during pre-training improves the model’s few-shot performance substantially. We show that neural mod-els can be trained with as few as 300 annotated examples while providing high fidelity, considerably lowering the resource requirements for standing up a new domain or language. This level of data efficiency removes the need for crowd-sourced data collection resulting in higher quality data annotated by expert linguists. In addition, model maintenance and debugging processes will improve in this few-shot setting. Finally, we explore distillation and using a caching system to satisfy latency requirements of real-world systems.

2020

pdf bib
Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data
Ankit Arun | Soumya Batra | Vikas Bhardwaj | Ashwini Challa | Pinar Donmez | Peyman Heidari | Hakan Inan | Shashank Jain | Anuj Kumar | Shawn Mei | Karthik Mohan | Michael White
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track

Natural language generation (NLG) is a critical component in conversational systems, owing to its role of formulating a correct and natural text response. Traditionally, NLG components have been deployed using template-based solutions. Although neural network solutions recently developed in the research community have been shown to provide several benefits, deployment of such model-based solutions has been challenging due to high latency, correctness issues, and high data needs. In this paper, we present approaches that have helped us deploy data-efficient neural solutions for NLG in conversational systems to production. We describe a family of sampling and modeling techniques to attain production quality with light-weight neural network models using only a fraction of the data that would be necessary otherwise, and show a thorough comparison between each. Our results show that domain complexity dictates the appropriate approach to achieve high data efficiency. Finally, we distill the lessons from our experimental findings into a list of best practices for production-level NLG model development, and present them in a brief runbook. Importantly, the end products of all of the techniques are small sequence-to-sequence models (~2Mb) that we can reliably deploy in production. These models achieve the same quality as large pretrained models (~1Gb) as judged by human raters.

pdf bib
Neural NLG for Methodius: From RST Meaning Representations to Texts
Symon Stevens-Guille | Aleksandre Maskharashvili | Amy Isard | Xintong Li | Michael White
Proceedings of the 13th International Conference on Natural Language Generation

While classic NLG systems typically made use of hierarchically structured content plans that included discourse relations as central components, more recent neural approaches have mostly mapped simple, flat inputs to texts without representing discourse relations explicitly. In this paper, we investigate whether it is beneficial to include discourse relations in the input to neural data-to-text generators for texts where discourse relations play an important role. To do so, we reimplement the sentence planning and realization components of a classic NLG system, Methodius, using LSTM sequence-to-sequence (seq2seq) models. We find that although seq2seq models can learn to generate fluent and grammatical texts remarkably well with sufficiently representative Methodius training data, they cannot learn to correctly express Methodius’s similarity and contrast comparisons unless the corresponding RST relations are included in the inputs. Additionally, we experiment with using self-training and reverse model reranking to better handle train/test data mismatches, and find that while these methods help reduce content errors, it remains essential to include discourse relations in the input to obtain optimal performance.

pdf bib
Leveraging Large Pretrained Models for WebNLG 2020
Xintong Li | Aleksandre Maskharashvili | Symon Jory Stevens-Guille | Michael White
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)

In this paper, we report experiments on finetuning large pretrained models to realize resource description framework (RDF) triples to natural language. We provide the details of how to build one of the top-ranked English generation models in WebNLG Challenge 2020. We also show that there appears to be considerable potential for reranking to improve the current state of the art both in terms of statistical metrics and model-based metrics. Our human analyses of the generated texts show that for Russian, pretrained models showed some success, both in terms of lexical and morpho-syntactic choices for generation, as well as for content aggregation. Nevertheless, in a number of cases, the model can be unpredictable, both in terms of failure or success. Omissions of the content and hallucinations, which in many cases occurred at the same time, were major problems. By contrast, the models for English showed near perfect performance on the validation set.

2019

pdf bib
The OSU/Facebook Realizer for SRST 2019: Seq2Seq Inflection and Serialized Tree2Tree Linearization
Kartikeya Upasani | David King | Jinfeng Rao | Anusha Balakrishnan | Michael White
Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019)

We describe our exploratory system for the shallow surface realization task, which combines morphological inflection using character sequence-to-sequence models with a baseline linearizer that implements a tree-to-tree model using sequence-to-sequence models on serialized trees. Results for morphological inflection were competitive across languages. Due to time constraints, we could only submit complete results (including linearization) for English. Preliminary linearization results were decent, with a small benefit from reranking to prefer valid output trees, but inadequate control over the words in the output led to poor quality on longer sentences.

pdf bib
Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue
Anusha Balakrishnan | Jinfeng Rao | Kartikeya Upasani | Michael White | Rajen Subba
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Generating fluent natural language responses from structured semantic representations is a critical step in task-oriented conversational systems. Avenues like the E2E NLG Challenge have encouraged the development of neural approaches, particularly sequence-to-sequence (Seq2Seq) models for this problem. The semantic representations used, however, are often underspecified, which places a higher burden on the generation model for sentence planning, and also limits the extent to which generated responses can be controlled in a live system. In this paper, we (1) propose using tree-structured semantic representations, like those used in traditional rule-based NLG systems, for better discourse-level structuring and sentence-level planning; (2) introduce a challenging dataset using this representation for the weather domain; (3) introduce a constrained decoding approach for Seq2Seq models that leverages this representation to improve semantic correctness; and (4) demonstrate promising results on our dataset and the E2E dataset.

pdf bib
Evaluation Order Effects in Dynamic Continuized CCG: From Negative Polarity Items to Balanced Punctuation
Michael White
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

pdf bib
Proceedings of the 1st Workshop on Discourse Structure in Neural NLG
Anusha Balakrishnan | Vera Demberg | Chandra Khatri | Abhinav Rastogi | Donia Scott | Marilyn Walker | Michael White
Proceedings of the 1st Workshop on Discourse Structure in Neural NLG

pdf bib
A Tree-to-Sequence Model for Neural NLG in Task-Oriented Dialog
Jinfeng Rao | Kartikeya Upasani | Anusha Balakrishnan | Michael White | Anuj Kumar | Rajen Subba
Proceedings of the 12th International Conference on Natural Language Generation

Generating fluent natural language responses from structured semantic representations is a critical step in task-oriented conversational systems. Sequence-to-sequence models on flat meaning representations (MR) have been dominant in this task, for example in the E2E NLG Challenge. Previous work has shown that a tree-structured MR can improve the model for better discourse-level structuring and sentence-level planning. In this work, we propose a tree-to-sequence model that uses a tree-LSTM encoder to leverage the tree structures in the input MR, and further enhance the decoding by a structure-enhanced attention mechanism. In addition, we explore combining these enhancements with constrained decoding to improve semantic correctness. Our experiments not only show significant improvements over standard seq2seq baselines, but also is more data-efficient and generalizes better to hard scenarios.

2018

pdf bib
Madly Ambiguous: A Game for Learning about Structural Ambiguity and Why It’s Hard for Computers
Ajda Gokcen | Ethan Hill | Michael White
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

Madly Ambiguous is an open source, online game aimed at teaching audiences of all ages about structural ambiguity and why it’s hard for computers. After a brief introduction to structural ambiguity, users are challenged to complete a sentence in a way that tricks the computer into guessing an incorrect interpretation. Behind the scenes are two different NLP-based methods for classifying the user’s input, one representative of classic rule-based approaches to disambiguation and the other representative of recent neural network approaches. Qualitative feedback from the system’s use in online, classroom, and science museum settings indicates that it is engaging and successful in conveying the intended take home messages. A demo of Madly Ambiguous can be played at http://madlyambiguous.osu.edu.

pdf bib
Using Paraphrasing and Memory-Augmented Models to Combat Data Sparsity in Question Interpretation with a Virtual Patient Dialogue System
Lifeng Jin | David King | Amad Hussein | Michael White | Douglas Danforth
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

When interpreting questions in a virtual patient dialogue system one must inevitably tackle the challenge of a long tail of relatively infrequently asked questions. To make progress on this challenge, we investigate the use of paraphrasing for data augmentation and neural memory-based classification, finding that the two methods work best in combination. In particular, we find that the neural memory-based approach not only outperforms a straight CNN classifier on low frequency questions, but also takes better advantage of the augmented data created by paraphrasing, together yielding a nearly 10% absolute improvement in accuracy on the least frequently asked questions.

pdf bib
The OSU Realizer for SRST ‘18: Neural Sequence-to-Sequence Inflection and Incremental Locality-Based Linearization
David King | Michael White
Proceedings of the First Workshop on Multilingual Surface Realisation

Surface realization is a nontrivial task as it involves taking structured data and producing grammatically and semantically correct utterances. Many competing grammar-based and statistical models for realization still struggle with relatively simple sentences. For our submission to the 2018 Surface Realization Shared Task, we tackle the shallow task by first generating inflected wordforms with a neural sequence-to-sequence model before incrementally linearizing them. For linearization, we use a global linear model trained using early update that makes use of features that take into account the dependency structure and dependency locality. Using this pipeline sufficed to produce surprisingly strong results in the shared task. In future work, we intend to pursue joint approaches to linearization and morphological inflection and incorporating a neural language model into the linearization choices.

pdf bib
LSTM Hypertagging
Reid Fu | Michael White
Proceedings of the 11th International Conference on Natural Language Generation

Hypertagging, or supertagging for surface realization, is the process of assigning lexical categories to nodes in an input semantic graph. Previous work has shown that hypertagging significantly increases realization speed and quality by reducing the search space of the realizer. Building on recent work using LSTMs to improve accuracy on supertagging for parsing, we develop an LSTM hypertagging method for OpenCCG, an open source NLP toolkit for CCG. Our results show significant improvements in both hypertagging accuracy and downstream realization performance.

2017

pdf bib
A Simple Method for Clarifying Sentences with Coordination Ambiguities
Michael White | Manjuan Duan | David L. King
Proceedings of the 1st Workshop on Explainable Computational Intelligence (XCI 2017)

pdf bib
Combining CNNs and Pattern Matching for Question Interpretation in a Virtual Patient Dialogue System
Lifeng Jin | Michael White | Evan Jaffe | Laura Zimmerman | Douglas Danforth
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

For medical students, virtual patient dialogue systems can provide useful training opportunities without the cost of employing actors to portray standardized patients. This work utilizes word- and character-based convolutional neural networks (CNNs) for question identification in a virtual patient dialogue system, outperforming a strong word- and character-based logistic regression baseline. While the CNNs perform well given sufficient training data, the best system performance is ultimately achieved by combining CNNs with a hand-crafted pattern matching system that is robust to label sparsity, providing a 10% boost in system accuracy and an error reduction of 47% as compared to the pattern-matching system alone.

pdf bib
Breaking NLP: Using Morphosyntax, Semantics, Pragmatics and World Knowledge to Fool Sentiment Analysis Systems
Taylor Mahler | Willy Cheung | Micha Elsner | David King | Marie-Catherine de Marneffe | Cory Shain | Symon Stevens-Guille | Michael White
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

This paper describes our “breaker” submission to the 2017 EMNLP “Build It Break It” shared task on sentiment analysis. In order to cause the “builder” systems to make incorrect predictions, we edited items in the blind test data according to linguistically interpretable strategies that allow us to assess the ease with which the builder systems learn various components of linguistic structure. On the whole, our submitted pairs break all systems at a high rate (72.6%), indicating that sentiment analysis as an NLP task may still have a lot of ground to cover. Of the breaker strategies that we consider, we find our semantic and pragmatic manipulations to pose the most substantial difficulties for the builder systems.

pdf bib
Parsing with Dynamic Continuized CCG
Michael White | Simon Charlow | Jordan Needle | Dylan Bumford
Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms

2016

pdf bib
A Corpus of Word-Aligned Asked and Anticipated Questions in a Virtual Patient Dialogue System
Ajda Gokcen | Evan Jaffe | Johnsey Erdmann | Michael White | Douglas Danforth
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a corpus of virtual patient dialogues to which we have added manually annotated gold standard word alignments. Since each question asked by a medical student in the dialogues is mapped to a canonical, anticipated version of the question, the corpus implicitly defines a large set of paraphrase (and non-paraphrase) pairs. We also present a novel process for selecting the most useful data to annotate with word alignments and for ensuring consistent paraphrase status decisions. In support of this process, we have enhanced the earlier Edinburgh alignment tool (Cohn et al., 2008) and revised and extended the Edinburgh guidelines, in particular adding guidance intended to ensure that the word alignments are consistent with the overall paraphrase status decision. The finished corpus and the enhanced alignment tool are made freely available.

pdf bib
Generating Disambiguating Paraphrases for Structurally Ambiguous Sentences
Manjuan Duan | Ethan Hill | Michael White
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods
Annie Louis | Michael Roth | Bonnie Webber | Michael White | Luke Zettlemoyer
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods

pdf bib
Enhancing PTB Universal Dependencies for Grammar-Based Surface Realization
David L. King | Michael White
Proceedings of the 9th International Natural Language Generation conference

2015

pdf bib
Interpreting Questions with a Log-Linear Ranking Model in a Virtual Patient Dialogue System
Evan Jaffe | Michael White | William Schuler | Eric Fosler-Lussier | Alex Rosenfeld | Douglas Danforth
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Inducing Clause-Combining Rules: A Case Study with the SPaRKy Restaurant Corpus
Michael White | David M. Howcroft
Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)

2014

pdf bib
That’s Not What I Meant! Using Parsers to Avoid Structural Ambiguities in Generated Text
Manjuan Duan | Michael White
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Towards Surface Realization with CCGs Induced from Dependencies
Michael White
Proceedings of the 8th International Natural Language Generation Conference (INLG)

2013

pdf bib
Enhancing the Expression of Contrast in the SPaRKy Restaurant Corpus
David Howcroft | Crystal Nakatsu | Michael White
Proceedings of the 14th European Workshop on Natural Language Generation

2012

pdf bib
Shallow and Deep Paraphrasing for Improved Machine Translation Parameter Optimization
Dennis N. Mehay | Michael White
Workshop on Monolingual Machine Translation

String comparison methods such as BLEU (Papineni et al., 2002) are the de facto standard in MT evaluation (MTE) and in MT system parameter tuning (Och, 2003). It is difficult for these metrics to recognize legitimate lexical and grammatical paraphrases, which is important for MT system tuning (Madnani, 2010). We present two methods to address this: a shallow lexical substitution technique and a grammar-driven paraphrasing technique. Grammatically precise paraphrasing is novel in the context of MTE, and demonstrating its usefulness is a key contribution of this paper. We use these techniques to paraphrase a single reference, which, when used for parameter tuning, leads to superior translation performance over baselines that use only human-authored references.

pdf bib
A Joint Phrasal and Dependency Model for Paraphrase Alignment
Kapil Thadani | Scott Martin | Michael White
Proceedings of COLING 2012: Posters

pdf bib
Minimal Dependency Length in Realization Ranking
Michael White | Rajakrishnan Rajkumar
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
The Surface Realisation Task: Recent Developments and Future Plans
Anja Belz | Bernd Bohnet | Simon Mille | Leo Wanner | Michael White
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

pdf bib
Shared Task Proposal: Syntactic Paraphrase Ranking
Michael White
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

2011

pdf bib
Creating Disjunctive Logical Forms from Aligned Sentences for Grammar-Based Paraphrase Generation
Scott Martin | Michael White
Proceedings of the Workshop on Monolingual Text-To-Text Generation

pdf bib
Linguistically Motivated Complementizer Choice in Surface Realization
Rajakrishnan Rajkumar | Michael White
Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop

pdf bib
Glue Rules for Robust Chart Realization
Michael White
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib
The First Surface Realisation Shared Task: Overview and Evaluation Results
Anja Belz | Michael White | Dominic Espinosa | Eric Kow | Deirdre Hogan | Amanda Stent
Proceedings of the 13th European Workshop on Natural Language Generation

pdf bib
The OSU System for Surface Realization at Generation Challenges 2011
Rajakrishnan Rajkumar | Dominic Espinosa | Michael White
Proceedings of the 13th European Workshop on Natural Language Generation

2010

pdf bib
Designing Agreement Features for Realization Ranking
Rajakrishnan Rajkumar | Michael White
Coling 2010: Posters

pdf bib
Further Meta-Evaluation of Broad-Coverage Surface Realization
Dominic Espinosa | Rajakrishnan Rajkumar | Michael White | Shoshana Berleant
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Generating Tailored, Comparative Descriptions with Contextually Appropriate Intonation
Michael White | Robert A. J. Clark | Johanna D. Moore
Computational Linguistics, Volume 36, Number 2, June 2010

pdf bib
Finding Common Ground: Towards a Surface Realisation Shared Task
Anja Belz | Mike White | Josef van Genabith | Deirdre Hogan | Amanda Stent
Proceedings of the 6th International Natural Language Generation Conference

2009

pdf bib
Perceptron Reranking for CCG Realization
Michael White | Rajakrishnan Rajkumar
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Exploiting Named Entity Classes in CCG Surface Realization
Rajakrishnan Rajkumar | Michael White | Dominic Espinosa
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Grammar Engineering for CCG using Ant and XSLT
Scott Martin | Rajakrishnan Rajkumar | Michael White
Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)

2008

pdf bib
Projecting Propbank Roles onto the CCGbank
Stephen Boxwell | Michael White
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes a method of accurately projecting Propbank roles onto constituents in the CCGbank and automatically annotating verbal categories with the semantic roles of their arguments. This method will be used to improve the structure of the derivations in the CCGbank and to facilitate research on semantic role tagging and broad coverage generation with CCG.

pdf bib
Hypertagging: Supertagging for Surface Realization with CCG
Dominic Espinosa | Michael White | Dennis Mehay
Proceedings of ACL-08: HLT

pdf bib
Proceedings of the Fifth International Natural Language Generation Conference
Michael White | Crystal Nakatsu | David McDonald
Proceedings of the Fifth International Natural Language Generation Conference

pdf bib
A More Precise Analysis of Punctuation for Broad-Coverage Surface Realization with CCG
Michael White | Rajakrishnan Rajkumar
Coling 2008: Proceedings of the workshop on Grammar Engineering Across Frameworks

2007

pdf bib
Towards broad coverage surface realization with CCG
Michael White | Rajakrishnan Rajkumar | Scott Martin
Proceedings of the Workshop on Using corpora for natural language generation

pdf bib
Avoiding Repetition in Generated Text
Mary Ellen Foster | Michael White
Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)

2006

pdf bib
Learning to Say It Well: Reranking Realizations by Predicted Synthesis Quality
Crystal Nakatsu | Michael White
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
CCG Chart Realization from Disjunctive Inputs
Michael White
Proceedings of the Fourth International Natural Language Generation Conference

2005

pdf bib
Multimodal Generation in the COMIC Dialogue System
Mary E. Foster | Michael White | Andrea Setzer | Roberta Catizone
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
Designing an Extensible API for Integrating Language Modeling and Realization
Michael White
Proceedings of Workshop on Software

2004

pdf bib
Techniques for Text Planning with XSLT
Mary Ellen Foster | Michael White
Proceeedings of the Workshop on NLP and XML (NLPXML-2004): RDF/RDFS and OWL in Language Technology

2003

pdf bib
Adapting Chart Realization to CCG
Michael White | Jason Baldridge
Proceedings of the 9th European Workshop on Natural Language Generation (ENLG-2003) at EACL 2003

2002

pdf bib
Selecting sentences for multidocument summaries using randomized local search
Michael White | Claire Cardie
Proceedings of the ACL-02 Workshop on Automatic Summarization

pdf bib
Learning Domain-Specific Transfer Rules: An Experiment with Korean to English Translation
Benoit Lavoie | Michael White | Tanya Korelsky
COLING-02: Machine Translation in Asia

2001

pdf bib
Multidocument Summarization via Information Extraction
Michael White | Tanya Korelsky | Claire Cardie | Vincent Ng | David Pierce | Kiri Wagstaff
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
Inducing Lexico-Structural Transfer Rules from Parsed Bi-texts
Benoit Lavoie | Michael White | Tanya Korelsky
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

2000

pdf bib
Towards Translingual Information Access using Portable Information Extraction
Michael White | Claire Cardie | Chung-hye Han | Nari Kim | Benoit Lavoie | Martha Palmer | Owen Rainbow | Juntae Yoon
ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems

1998

pdf bib
EXEMPLARS: A Practical, Extensible Framework For Dynamic Text Generation
Michael White | Ted Caldwell
Natural Language Generation

1997

pdf bib
CogentHelp: NLG meets SE in a tool for authoring dynamically generated on-line help
Michael White | David E. Caldwell
Fifth Conference on Applied Natural Language Processing

1993

pdf bib
Delimitedness and Trajectory-of-Motion Events
Michael White
Sixth Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
The Imperfective Paradox and Trajectory-of-Motion Events
Michael White
31st Annual Meeting of the Association for Computational Linguistics

1992

pdf bib
Conceptual Structures and CCG: Linking Theory and Incorporated Argument Adjuncts
Michael White
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics

pdf bib
On the Interpretation of Natural Language Instructions
Barbara Di Eugenio | Michael White
COLING 1992 Volume 4: The 14th International Conference on Computational Linguistics

Search
Co-authors
Fix data