Omar Agha


2021

pdf bib
NOPE: A Corpus of Naturally-Occurring Presuppositions in English
Alicia Parrish | Sebastian Schuster | Alex Warstadt | Omar Agha | Soo-Hwan Lee | Zhuoye Zhao | Samuel R. Bowman | Tal Linzen
Proceedings of the 25th Conference on Computational Natural Language Learning

Understanding language requires grasping not only the overtly stated content, but also making inferences about things that were left unsaid. These inferences include presuppositions, a phenomenon by which a listener learns about new information through reasoning about what a speaker takes as given. Presuppositions require complex understanding of the lexical and syntactic properties that trigger them as well as the broader conversational context. In this work, we introduce the Naturally-Occurring Presuppositions in English (NOPE) Corpus to investigate the context-sensitivity of 10 different types of presupposition triggers and to evaluate machine learning models’ ability to predict human inferences. We find that most of the triggers we investigate exhibit moderate variability. We further find that transformer-based models draw correct inferences in simple cases involving presuppositions, but they fail to capture the minority of exceptional cases in which human judgments reveal complex interactions between context and triggers.

pdf bib
Does Putting a Linguist in the Loop Improve NLU Data Collection?
Alicia Parrish | William Huang | Omar Agha | Soo-Hwan Lee | Nikita Nangia | Alexia Warstadt | Karmanya Aggarwal | Emily Allaway | Tal Linzen | Samuel R. Bowman
Findings of the Association for Computational Linguistics: EMNLP 2021

Many crowdsourced NLP datasets contain systematic artifacts that are identified only after data collection is complete. Earlier identification of these issues should make it easier to create high-quality training and evaluation data. We attempt this by evaluating protocols in which expert linguists work ‘in the loop’ during data collection to identify and address these issues by adjusting task instructions and incentives. Using natural language inference as a test case, we compare three data collection protocols: (i) a baseline protocol with no linguist involvement, (ii) a linguist-in-the-loop intervention with iteratively-updated constraints on the writing task, and (iii) an extension that adds direct interaction between linguists and crowdworkers via a chatroom. We find that linguist involvement does not lead to increased accuracy on out-of-domain test sets compared to baseline, and adding a chatroom has no effect on the data. Linguist involvement does, however, lead to more challenging evaluation data and higher accuracy on some challenge sets, demonstrating the benefits of integrating expert analysis during data collection.