Neha Srikanth


2024

pdf bib
How Often Are Errors in Natural Language Reasoning Due to Paraphrastic Variability?
Neha Srikanth | Marine Carpuat | Rachel Rudinger
Transactions of the Association for Computational Linguistics, Volume 12

Large language models have been shown to behave inconsistently in response to meaning-preserving paraphrastic inputs. At the same time, researchers evaluate the knowledge and reasoning abilities of these models with test evaluations that do not disaggregate the effect of paraphrastic variability on performance. We propose a metric, PC, for evaluating the paraphrastic consistency of natural language reasoning models based on the probability of a model achieving the same correctness on two paraphrases of the same problem. We mathematically connect this metric to the proportion of a model’s variance in correctness attributable to paraphrasing. To estimate PC, we collect ParaNlu, a dataset of 7,782 human-written and validated paraphrased reasoning problems constructed on top of existing benchmark datasets for defeasible and abductive natural language inference.1 Using ParaNlu, we measure the paraphrastic consistency of several model classes and show that consistency dramatically increases with pretraining but not fine-tuning. All models tested exhibited room for improvement in paraphrastic consistency.

pdf bib
Pregnant Questions: The Importance of Pragmatic Awareness in Maternal Health Question Answering
Neha Srikanth | Rupak Sarkar | Heran Mane | Elizabeth Aparicio | Quynh Nguyen | Rachel Rudinger | Jordan Boyd-Graber
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Questions posed by information-seeking users often contain implicit false or potentially harmful assumptions. In a high-risk domain such as maternal and infant health, a question-answering system must recognize these pragmatic constraints and go beyond simply answering user questions, examining them in context to respond helpfully. To achieve this, we study assumptions and implications, or pragmatic inferences, made when mothers ask questions about pregnancy and infant care by collecting a dataset of 2,727 inferences from 500 questions across three diverse sources. We study how health experts naturally address these inferences when writing answers, and illustrate that informing existing QA pipelines with pragmatic inferences produces responses that are more complete, mitigating the propagation of harmful beliefs.

2022

pdf bib
Partial-input baselines show that NLI models can ignore context, but they don’t.
Neha Srikanth | Rachel Rudinger
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

When strong partial-input baselines reveal artifacts in crowdsourced NLI datasets, the performance of full-input models trained on such datasets is often dismissed as reliance on spurious correlations. We investigate whether state-of-the-art NLI models are capable of overriding default inferences made by a partial-input baseline. We introduce an evaluation set of 600 examples consisting of perturbed premises to examine a RoBERTa model’s sensitivity to edited contexts. Our results indicate that NLI models are still capable of learning to condition on context—a necessary component of inferential reasoning—despite being trained on artifact-ridden datasets.

2021

pdf bib
Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification
Neha Srikanth | Junyi Jessy Li
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021