Benjamin Newman


2023

pdf bib
A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents
Benjamin Newman | Luca Soldaini | Raymond Fok | Arman Cohan | Kyle Lo
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may find snippets difficult to understand as they lack context from the original document. In this work, we use language models to rewrite snippets from scientific documents to be read on their own. First, we define the requirements and challenges for this user-facing decontextualization task, such as clarifying where edits occur and handling references to other documents. Second, we propose a framework that decomposes the task into three stages: question generation, question answering, and rewriting. Using this framework, we collect gold decontextualizations from experienced scientific article readers. We then conduct a range of experiments across state-of-the-art commercial and open-source language models to identify how to best provide missing-but-relevant information to models for our task. Finally, we develop QaDecontext, a simple prompting strategy inspired by our framework that improves over end-to-end prompting. We conclude with analysis that finds, while rewriting is easy, question generation and answering remain challenging for today’s models.

pdf bib
PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents
Kyle Lo | Zejiang Shen | Benjamin Newman | Joseph Chang | Russell Authur | Erin Bransom | Stefan Candra | Yoganand Chandrasekhar | Regan Huff | Bailey Kuehl | Amanpreet Singh | Chris Wilhelm | Angele Zamarron | Marti A. Hearst | Daniel Weld | Doug Downey | Luca Soldaini
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Despite growing interest in applying natural language processing (NLP) and computer vision (CV) models to the scholarly domain, scientific documents remain challenging to work with. They’re often in difficult-to-use PDF formats, and the ecosystem of models to process them is fragmented and incomplete. We introduce PaperMage, an open-source Python toolkit for analyzing and processing visually-rich, structured scientific documents. PaperMage offers clean and intuitive abstractions for seamlessly representing and manipulating both textual and visual document elements. PaperMage achieves this by integrating disparate state-of-the-art NLP and CV models into a unified framework, and provides turn-key recipes for common scientific document processing use-cases. PaperMage has powered multiple research prototypes of AI applications over scientific documents, along with Semantic Scholar’s large-scale production system for processing millions of PDFs. GitHub: https://github.com/allenai/papermage

2021

pdf bib
Refining Targeted Syntactic Evaluation of Language Models
Benjamin Newman | Kai-Siang Ang | Julia Gong | John Hewitt
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Targeted syntactic evaluation of subject-verb number agreement in English (TSE) evaluates language models’ syntactic knowledge using hand-crafted minimal pairs of sentences that differ only in the main verb’s conjugation. The method evaluates whether language models rate each grammatical sentence as more likely than its ungrammatical counterpart. We identify two distinct goals for TSE. First, evaluating the systematicity of a language model’s syntactic knowledge: given a sentence, can it conjugate arbitrary verbs correctly? Second, evaluating a model’s likely behavior: given a sentence, does the model concentrate its probability mass on correctly conjugated verbs, even if only on a subset of the possible verbs? We argue that current implementations of TSE do not directly capture either of these goals, and propose new metrics to capture each goal separately. Under our metrics, we find that TSE overestimates systematicity of language models, but that models score up to 40% better on verbs that they predict are likely in context.

2020

pdf bib
Communication-based Evaluation for Natural Language Generation
Benjamin Newman | Reuben Cohn-Gordon | Christopher Potts
Proceedings of the Society for Computation in Linguistics 2020

pdf bib
The EOS Decision and Length Extrapolation
Benjamin Newman | John Hewitt | Percy Liang | Christopher D. Manning
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Extrapolation to unseen sequence lengths is a challenge for neural generative models of language. In this work, we characterize the effect on length extrapolation of a modeling decision often overlooked: predicting the end of the generative process through the use of a special end-of-sequence (EOS) vocabulary item. We study an oracle setting - forcing models to generate to the correct sequence length at test time - to compare the length-extrapolative behavior of networks trained to predict EOS (+EOS) with networks not trained to (-EOS). We find that -EOS substantially outperforms +EOS, for example extrapolating well to lengths 10 times longer than those seen at training time in a bracket closing task, as well as achieving a 40% improvement over +EOS in the difficult SCAN dataset length generalization task. By comparing the hidden states and dynamics of -EOS and +EOS models, we observe that +EOS models fail to generalize because they (1) unnecessarily stratify their hidden states by their linear position is a sequence (structures we call length manifolds) or (2) get stuck in clusters (which we refer to as length attractors) once the EOS token is the highest-probability prediction.