Katherine Keith


pdf bib
Literary Intertextual Semantic Change Detection: Application and Motivation for Evaluating Models on Small Corpora
Jackson Ehrenworth | Katherine Keith
Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

Lexical semantic change detection is the study of how words change meaning between corpora. While Schlechtweg et al. (2020) standardized both datasets and evaluation metrics for this shared task, for those interested in applying semantic change detection models to small corpora—e.g., in the digital humanities—there is a need for evaluation involving much smaller datasets. We present a method and open-source code pipeline for downsampling the SemEval-2020 Task 1 corpora while preserving gold standard measures of semantic change. We then evaluate several state-of-the-art models trained on these downsampled corpora and find both dramatically decreased performance (average 67% decrease) and high variance. We also propose a novel application to the digital humanities and provide a case study demonstrating that semantic change detection can be used in an exploratory manner to produce insightful avenues of investigation for literary scholars.

pdf bib
Causal Matching with Text Embeddings: A Case Study in Estimating the Causal Effects of Peer Review Policies
Raymond Zhang | Neha Nayak Kennard | Daniel Smith | Daniel McFarland | Andrew McCallum | Katherine Keith
Findings of the Association for Computational Linguistics: ACL 2023

A promising approach to estimate the causal effects of peer review policies is to analyze data from publication venues that shift policies from single-blind to double-blind from one year to the next. However, in these settings the content of the manuscript is a confounding variable—each year has a different distribution of scientific content which may naturally affect the distribution of reviewer scores. To address this textual confounding, we extend variable ratio nearest neighbor matching to incorporate text embeddings. We compare this matching method to a widely-used causal method of stratified propensity score matching and a baseline of randomly selected matches. For our case study of the ICLR conference shifting from single- to double-blind review from 2017 to 2018, we find human judges prefer manuscript matches from our method in 70% of cases. While the unadjusted estimate of the average causal effect of reviewers’ scores is -0.25, our method shifts the estimate to -0.17, a slightly smaller difference between the outcomes of single- and double-blind policies. We hope this case study enables exploration of additional text-based causal estimation methods and domains in the future.

pdf bib
Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications
Li Lucy | Jesse Dodge | David Bamman | Katherine Keith
Findings of the Association for Computational Linguistics: ACL 2023

Scholarly text is often laden with jargon, or specialized language that can facilitate efficient in-group communication within fields but hinder understanding for out-groups. In this work, we develop and validate an interpretable approach for measuring scholarly jargon from text. Expanding the scope of prior work which focuses on word types, we use word sense induction to also identify words that are widespread but overloaded with different meanings across fields. We then estimate the prevalence of these discipline-specific words and senses across hundreds of subfields, and show that word senses provide a complementary, yet unique view of jargon alongside word types. We demonstrate the utility of our metrics for science of science and computational sociolinguistics by highlighting two key social implications. First, though most fields reduce their use of jargon when writing for general-purpose venues, and some fields (e.g., biological sciences) do so less than others. Second, the direction of correlation between jargon and citation rates varies among fields, but jargon is nearly always negatively correlated with interdisciplinary impact. Broadly, our findings suggest that though multidisciplinary venues intend to cater to more general audiences, some fields’ writing norms may act as barriers rather than bridges, and thus impede the dispersion of scholarly ideas.


pdf bib
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)
David Bamman | Dirk Hovy | David Jurgens | Katherine Keith | Brendan O'Connor | Svitlana Volkova
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)


pdf bib
Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence
Andrew Halterman | Katherine Keith | Sheikh Sarwar | Brendan O’Connor
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Proceedings of the First Workshop on Causal Inference and NLP
Amir Feder | Katherine Keith | Emaad Manzoor | Reid Pryzant | Dhanya Sridhar | Zach Wood-Doughty | Jacob Eisenstein | Justin Grimmer | Roi Reichart | Molly Roberts | Uri Shalit | Brandon Stewart | Victor Veitch | Diyi Yang
Proceedings of the First Workshop on Causal Inference and NLP

pdf bib
Text as Causal Mediators: Research Design for Causal Estimates of Differential Treatment of Social Groups via Language Aspects
Katherine Keith | Douglas Rice | Brendan O’Connor
Proceedings of the First Workshop on Causal Inference and NLP

Using observed language to understand interpersonal interactions is important in high-stakes decision making. We propose a causal research design for observational (non-experimental) data to estimate the natural direct and indirect effects of social group signals (e.g. race or gender) on speakers’ responses with separate aspects of language as causal mediators. We illustrate the promises and challenges of this framework via a theoretical case study of the effect of an advocate’s gender on interruptions from justices during U.S. Supreme Court oral arguments. We also discuss challenges conceptualizing and operationalizing causal variables such as gender and language that comprise of many components, and we articulate technical open challenges such as temporal dependence between language mediators in conversational settings.


pdf bib
Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates
Katherine Keith | David Jensen | Brendan O’Connor
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Many applications of computational social science aim to infer causal conclusions from non-experimental data. Such observational data often contains confounders, variables that influence both potential causes and potential effects. Unmeasured or latent confounders can bias causal estimates, and this has motivated interest in measuring potential confounders from observed text. For example, an individual’s entire history of social media posts or the content of a news article could provide a rich measurement of multiple confounders. Yet, methods and applications for this problem are scattered across different communities and evaluation practices are inconsistent. This review is the first to gather and categorize these examples and provide a guide to data-processing and evaluation decisions. Despite increased attention on adjusting for confounding using text, there are still many open problems, which we highlight in this paper.

pdf bib
Uncertainty over Uncertainty: Investigating the Assumptions, Annotations, and Text Measurements of Economic Policy Uncertainty
Katherine Keith | Christoph Teichmann | Brendan O’Connor | Edgar Meij
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

Methods and applications are inextricably linked in science, and in particular in the domain of text-as-data. In this paper, we examine one such text-as-data application, an established economic index that measures economic policy uncertainty from keyword occurrences in news. This index, which is shown to correlate with firm investment, employment, and excess market returns, has had substantive impact in both the private sector and academia. Yet, as we revisit and extend the original authors’ annotations and text measurements we find interesting text-as-data methodological research questions: (1) Are annotator disagreements a reflection of ambiguity in language? (2) Do alternative text measurements correlate with one another and with measures of external predictive validity? We find for this application (1) some annotator disagreements of economic policy uncertainty can be attributed to ambiguity in language, and (2) switching measurements from keyword-matching to supervised machine learning classifiers results in low correlation, a concerning implication for the validity of the index.


pdf bib
Modeling Financial Analysts’ Decision Making via the Pragmatics and Semantics of Earnings Calls
Katherine Keith | Amanda Stent
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Every fiscal quarter, companies hold earnings calls in which company executives respond to questions from analysts. After these calls, analysts often change their price target recommendations, which are used in equity re- search reports to help investors make deci- sions. In this paper, we examine analysts’ decision making behavior as it pertains to the language content of earnings calls. We identify a set of 20 pragmatic features of analysts’ questions which we correlate with analysts’ pre-call investor recommendations. We also analyze the degree to which semantic and pragmatic features from an earnings call complement market data in predicting analysts’ post-call changes in price targets. Our results show that earnings calls are moderately predictive of analysts’ decisions even though these decisions are influenced by a number of other factors including private communication with company executives and market conditions. A breakdown of model errors indicates disparate performance on calls from different market sectors.


pdf bib
Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses
Katherine Keith | Su Lin Blodgett | Brendan O’Connor
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Dependency parsing research, which has made significant gains in recent years, typically focuses on improving the accuracy of single-tree predictions. However, ambiguity is inherent to natural language syntax, and communicating such ambiguity is important for error analysis and better-informed downstream applications. In this work, we propose a transition sampling algorithm to sample from the full joint distribution of parse trees defined by a transition-based parsing model, and demonstrate the use of the samples in probabilistic dependency analysis. First, we define the new task of dependency path prediction, inferring syntactic substructures over part of a sentence, and provide the first analysis of performance on this task. Second, we demonstrate the usefulness of our Monte Carlo syntax marginal method for parser error analysis and calibration. Finally, we use this method to propagate parse uncertainty to two downstream information extraction applications: identifying persons killed by police and semantic role assignment.

pdf bib
Uncertainty-aware generative models for inferring document class prevalence
Katherine Keith | Brendan O’Connor
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Prevalence estimation is the task of inferring the relative frequency of classes of unlabeled examples in a group—for example, the proportion of a document collection with positive sentiment. Previous work has focused on aggregating and adjusting discriminative individual classifiers to obtain prevalence point estimates. But imperfect classifier accuracy ought to be reflected in uncertainty over the predicted prevalence for scientifically valid inference. In this work, we present (1) a generative probabilistic modeling approach to prevalence estimation, and (2) the construction and evaluation of prevalence confidence intervals; in particular, we demonstrate that an off-the-shelf discriminative classifier can be given a generative re-interpretation, by backing out an implicit individual-level likelihood function, which can be used to conduct fast and simple group-level Bayesian inference. Empirically, we demonstrate our approach provides better confidence interval coverage than an alternative, and is dramatically more robust to shifts in the class prior between training and testing.


pdf bib
Identifying civilians killed by police with distantly supervised entity-event extraction
Katherine Keith | Abram Handler | Michael Pinkham | Cara Magliozzi | Joshua McDuffie | Brendan O’Connor
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a new, socially-impactful task for natural language processing: from a news corpus, extract names of persons who have been killed by police. We present a newly collected police fatality corpus, which we release publicly, and present a model to solve this problem that uses EM-based distant supervision with logistic regression and convolutional neural network classifiers. Our model outperforms two off-the-shelf event extractor systems, and it can suggest candidate victim names in some cases faster than one of the major manually-collected police fatality databases.