Elliot Schumacher


2024

pdf bib
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
Albert Sun | Varun Nair | Elliot Schumacher | Anitha Kannan
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

A wave of new task-based virtual assistants has been fueled by increasingly powerful large language models (LLMs), such as GPT-4 (OpenAI, 2023). A major challenge in deploying LLM-based virtual conversational assistants in real world settings is ensuring they operate within what is admissible for the task. To overcome this challenge, the designers of these virtual assistants rely on an independent guardrail system that verifies the virtual assistant’s output aligns with the constraints required for the task. However, relying on commonly used, prompt-based guardrails can be difficult to engineer correctly and comprehensively. To address these challenges, we propose CONSCENDI. We use CONSCENDI to exhaustively generate training data with two key LLM-powered components: scenario-augmented generation and contrastive training examples. When generating conversational data, we generate a set of rule-breaking scenarios, which enumerate a diverse set of high-level ways a rule can be violated. This scenario-guided approach produces a diverse training set and provides chatbot designers greater control. To generate contrastive examples, we prompt the LLM to alter conversations with violations into acceptable conversations to enable fine-grained distinctions. We then use this data, generated by CONSCENDI, to train a smaller model. We find that CONSCENDI results in guardrail models that improve over baselines in multiple dialogue domains.

pdf bib
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents
Varun Nair | Elliot Schumacher | Geoffrey Tso | Anitha Kannan
Proceedings of the 6th Clinical Natural Language Processing Workshop

Large language models (LLMs) have emerged as valuable tools for many natural language understanding tasks. In safety-critical applications such as healthcare, the utility of these models is governed by their ability to generate factually accurate and complete outputs. In this work, we present dialog-enabled resolving agents (DERA). DERA is a paradigm made possible by the increased conversational abilities of LLMs. It provides a simple, interpretable forum for models to communicate feedback and iteratively improve output. We frame our dialog as a discussion between two agent types – a Researcher, who processes information and identifies crucial problem components, and a Decider, who has the autonomy to integrate the Researcher’s information and makes judgments on the final output.We test DERA against three clinically-focused tasks, with GPT-4 serving as our LLM. DERA shows significant improvement over the base GPT-4 performance in both human expert preference evaluations and quantitative metrics for medical conversation summarization and care plan generation. In a new finding, we also show that GPT-4’s performance (70%) on an open-ended version of the MedQA question-answering (QA) dataset (Jin 2021; USMLE) is well above the passing level (60%), with DERA showing similar performance. We will release the open-ended MedQA dataset.

2023

pdf bib
Generating medically-accurate summaries of patient-provider dialogue: A multi-stage approach using large language models
Varun Nair | Elliot Schumacher | Anitha Kannan
Proceedings of the 5th Clinical Natural Language Processing Workshop

A medical provider’s summary of a patient visit serves several critical purposes, including clinical decision-making, facilitating hand-offs between providers, and as a reference for the patient. An effective summary is required to be coherent and accurately capture all the medically relevant information in the dialogue, despite the complexity of patient-generated language. Even minor inaccuracies in visit summaries (for example, summarizing “patient does not have a fever” when a fever is present) can be detrimental to the outcome of care for the patient. This paper tackles the problem of medical conversation summarization by discretizing the task into several smaller dialogue-understanding tasks that are sequentially built upon. First, we identify medical entities and their affirmations within the conversation to serve as building blocks. We study dynamically constructing few-shot prompts for tasks by conditioning on relevant patient information and use GPT-3 as the backbone for our experiments. We also develop GPT-derived summarization metrics to measure performance against reference summaries quantitatively. Both our human evaluation study and metrics for medical correctness show that summaries generated using this approach are clinically accurate and outperform the baseline approach of summarizing the dialog in a zero-shot, single-prompt setting.

pdf bib
On the Surprising Effectiveness of Name Matching Alone in Autoregressive Entity Linking
Elliot Schumacher | James Mayfield | Mark Dredze
Proceedings of the First Workshop on Matching From Unstructured and Structured Data (MATCHING 2023)

Fifteen years of work on entity linking has established the importance of different information sources in making linking decisions: mention and entity name similarity, contextual relevance, and features of the knowledge base. Modern state-of-the-art systems build on these features, including through neural representations (Wu et al., 2020). In contrast to this trend, the autoregressive language model GENRE (De Cao et al., 2021) generates normalized entity names for mentions and beats many other entity linking systems, despite making no use of knowledge base (KB) information. How is this possible? We analyze the behavior of GENRE on several entity linking datasets and demonstrate that its performance stems from memorization of name patterns. In contrast, it fails in cases that might benefit from using the KB. We experiment with a modification to the model to enable it to utilize KB information, highlighting challenges to incorporating traditional entity linking information sources into autoregressive models.

2022

pdf bib
Zero-shot Cross-Language Transfer of Monolingual Entity Linking Models
Elliot Schumacher | James Mayfield | Mark Dredze
Proceedings of the 2nd Workshop on Multi-lingual Representation Learning (MRL)

Most entity linking systems, whether mono or multilingual, link mentions to a single English knowledge base. Few have considered linking non-English text to a non-English KB, and therefore, transferring an English entity linking model to both a new document and KB language. We consider the task of zero-shot cross-language transfer of entity linking systems to a new language and KB. We find that a system trained with multilingual representations does reasonably well, and propose improvements to system training that lead to improved recall in most datasets, often matching the in-language performance. We further conduct a detailed evaluation to elucidate the challenges of this setting.

2021

pdf bib
Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking
Elliot Schumacher | James Mayfield | Mark Dredze
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Clinical Concept Linking with Contextualized Neural Representations
Elliot Schumacher | Andriy Mulyar | Mark Dredze
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In traditional approaches to entity linking, linking decisions are based on three sources of information – the similarity of the mention string to an entity’s name, the similarity of the context of the document to the entity, and broader information about the knowledge base (KB). In some domains, there is little contextual information present in the KB and thus we rely more heavily on mention string similarity. We consider one example of this, concept linking, which seeks to link mentions of medical concepts to a medical concept ontology. We propose an approach to concept linking that leverages recent work in contextualized neural models, such as ELMo (Peters et al. 2018), which create a token representation that integrates the surrounding context of the mention and concept name. We find a neural ranking approach paired with contextualized embeddings provides gains over a competitive baseline (Leaman et al. 2013). Additionally, we find that a pre-training step using synonyms from the ontology offers a useful initialization for the ranker.

2016

pdf bib
Predicting the Relative Difficulty of Single Sentences With and Without Surrounding Context
Elliot Schumacher | Maxine Eskenazi | Gwen Frishkoff | Kevyn Collins-Thompson
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing