Brendan O’Connor - ACL Anthology

Brendan O’Connor

Also published as: Brendan O'Connor

2026

Coordinates from Context: Using LLMs to Ground Complex Location References
Tessa Masis | Brendan O'Connor
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Geocoding is the task of linking a location reference to an actual geographic location and is essential for many downstream analyses of unstructured text. In this paper, we explore the challenging setting of geocoding compositional location references. Building on recent work demonstrating LLMs’ abilities to reason over geospatial data, we evaluate LLMs’ geospatial knowledge versus reasoning skills relevant to our task. Based on these insights, we propose an LLM-based strategy for geocoding compositional location references. We show that our approach improves performance for the task and that a relatively small fine-tuned LLM can achieve comparable performance with much larger off-the-shelf models.

Contextual morphologically-guided tokenization for Latin encoder models
Marisa Hudspeth | Patrick J. Burns | Brendan O'Connor
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Tokenization is a critical component of language model pretraining, yet standard tokenization methods often prioritize information-theoretical goals like high compression and low fertility rather than linguistic goals like morphological alignment. In fact, they have been shown to be suboptimal for morphologically rich languages, where tokenization quality directly impacts downstream performance. In this work, we investigate morphologically-aware tokenization for Latin, a morphologically rich language that is medium-resource in terms of pretraining data, but high-resource in terms of curated lexical resources – a distinction that is often overlooked but critical in discussions of low-resource language modeling. We find that morphologically-guided tokenization improves overall performance on four downstream tasks. Performance gains are most pronounced for out of domain texts, highlighting our models’ improved generalization ability. Our findings demonstrate the utility of linguistic resources to improve language modeling for morphologically complex languages. For low-resource languages that lack large-scale pretraining data, the development and incorporation of linguistic resources can serve as a feasible alternative to improve LM performance.

2025

Automated main concept generation for narrative discourse assessment in aphasia
Ankita Gupta | Marisa Hudspeth | Polly Stokes | Jacquie Kurland | Brendan O’Connor
Findings of the Association for Computational Linguistics: ACL 2025

We present an interesting application of narrative understanding in the clinical assessment of aphasia, where story retelling tasks are used to evaluate a patient’s communication abilities. This clinical setting provides a framework to help operationalize narrative discourse analysis and an application-focused evaluation method for narrative understanding systems. In particular, we highlight the use of main concepts (MCs)—a list of statements that capture a story’s gist—for aphasic discourse analysis. We then propose automatically generating MCs from novel stories, which experts can edit manually, thus enabling wider adaptation of current assessment tools. We further develop a prompt ensemble method using large language models (LLMs) to automatically generate MCs for a novel story. We evaluate our method on an existing narrative summarization dataset to establish its intrinsic validity. We further apply it to a set of stories that have been annotated with MCs through extensive analysis of retells from non-aphasic and aphasic participants (Kurland et al., 2021, 2025). Our results show that our proposed method can generate most of the gold-standard MCs for stories from this dataset. Finally, we release this dataset of stories with annotated MCs to spur more research in this area.

𝛿-Stance: A Large-Scale Real World Dataset of Stances in Legal Argumentation
Ankita Gupta | Douglas Rice | Brendan O’Connor
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present 𝛿-Stance, a large-scale dataset of stances involved in legal argumentation. 𝛿-Stance contains stance-annotated argument pairs, semi-automatically mined from millions of examples of U.S. judges citing precedent in context using citation signals. The dataset aims to facilitate work on the legal argument stance classification task, which involves assessing whether a case summary strengthens or weakens a legal argument (polarity) and to what extent (intensity). To assess the complexity of this task, we evaluate various existing NLP methods, including zero-shot prompting proprietary large language models (LLMs), and supervised fine-tuning of smaller open-weight language models (LMs) on 𝛿-Stance. Our findings reveal that although prompting proprietary LLMs can help predict stance polarity, supervised model fine-tuning on 𝛿-Stance is necessary to distinguish intensity. We further find that alternative strategies such as domain-specific pretraining and zero-shot prompting using masked LMs remain insufficient. Beyond our dataset’s utility for the legal domain, we further find that fine-tuning small LMs on 𝛿-Stance improves their performance in other domains. Finally, we study how temporal changes in signal definition can impact model performance, highlighting the importance of careful data curation for downstream tasks by considering the historical and sociocultural context. We publish the associated dataset to foster further research on legal argument reasoning.

2024

Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input
Tessa Masis | Brendan O’Connor
Proceedings of the Sixth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS 2024)

Geo-entity linking is the task of linking a location mention to the real-world geographic location. In this we explore the challenging task of geo-entity linking for noisy, multilingual social media data. There are few open-source multilingual geo-entity linking tools available and existing ones are often rule-based, which break easily in social media settings, or LLM-based, which are too expensive for large-scale datasets. We present a method which represents real-world locations as averaged embeddings from labeled user-input location names and allows for selective prediction via an interpretable confidence score. We show that our approach improves geo-entity linking on a global and multilingual social media dataset, and discuss progress and problems with evaluating at different geographic granularities.

Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time
Marisa Hudspeth | Brendan O’Connor | Laure Thompson
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)

Existing Latin treebanks draw from Latin’s long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks’ annotations to better train and evaluate morphological taggers. However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data. In this work, we review existing Latin treebanks to identify the texts they draw from, identify their overlap, and document their coverage across time and genre. We additionally design automated conversions of their morphological feature annotations into the conventions of standard Latin grammar. From this, we build new time-period data splits that draw from the existing treebanks which we use to perform a broad cross-time analysis for POS and morphological feature tagging. We find that BERT-based taggers outperform existing taggers while also being more robust to cross-domain shifts.

The State of Relation Extraction Data Quality: Is Bigger Always Better?
Erica Cai | Brendan O’Connor
Findings of the Association for Computational Linguistics: ACL 2024

Relation extraction (RE) extracts structured tuples of relationships (e.g. friend, enemy) between entities (e.g. Sherlock Holmes, John Watson) from text, with exciting potential applications. Hundreds of RE papers have been published in recent years; do their evaluation practices inform these goals? We review recent surveys and a sample of recent RE methods papers, compiling 38 datasets currently being used. Unfortunately, many have frequent label errors, and ones with known problems continue to be used. Many datasets focus on producing labels for a large number of relation types, often through error-prone annotation methods (e.g. distant supervision or crowdsourcing), and many recent papers rely exclusively on such datasets. We draw attention to a promising alternative: datasets with a small number of relations, often in specific domains like chemistry, finance, or biomedicine, where it is possible to obtain high quality expert annotations; such data can more realistically evaluate RE performance. The research community should consider more often using such resources.

Harnessing Toulmin’s theory for zero-shot argument explication
Ankita Gupta | Ethan Zuckerman | Brendan O’Connor
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

To better analyze informal arguments on public forums, we propose the task of argument explication, which makes explicit a text’s argumentative structure and implicit reasoning by outputting triples of propositions ⟨claim, reason warrant⟩. The three slots, or argument components, are derived from the widely known Toulmin (1958) model of argumentation. While prior research applies Toulmin or related theories to annotate datasets and train supervised models, we develop an effective method to prompt generative large language models (LMs) to output explicitly named argument components proposed by Toulmin by prompting with the theory name (e.g., ‘According to Toulmin model’). We evaluate the outputs’ coverage and validity through a human study and automatic evaluation based on prior argumentation datasets and perform robustness checks over alternative LMs, prompts, and argumentation theories. Finally, we conduct a proof-of-concept case study to extract an interpretable argumentation (hyper)graph from a large corpus of critical public comments on whether to allow the COVID-19 vaccine for children, suggesting future directions for corpus analysis and argument visualization.

2023

ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution
Ankita Gupta | Marzena Karpinska | Wenlong Zhao | Kalpesh Krishna | Jack Merullo | Luke Yeh | Mohit Iyyer | Brendan O’Connor
Findings of the Association for Computational Linguistics: EACL 2023

Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with various backgrounds. In this work, we develop a crowdsourcing-friendly coreference annotation methodology, ezCoref, consisting of an annotation tool and an interactive tutorial. We use ezCoref to re-annotate 240 passages from seven existing English coreference datasets (spanning fiction, news, and multiple other domains) while teaching annotators only cases that are treated similarly across these datasets. Surprisingly, we find that reasonable quality annotations were already achievable (90% agreement between the crowd and expert annotations) even without extensive training. On carefully analyzing the remaining disagreements, we identify the presence of linguistic cases that our annotators unanimously agree upon but lack unified treatments (e.g., generic pronouns, appositives) in existing datasets. We propose the research community should revisit these phenomena when curating future unified annotation guidelines.

Evaluating Zero-Shot Event Structures: Recommendations for Automatic Content Extraction (ACE) Annotations
Erica Cai | Brendan O’Connor
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Zero-shot event extraction (EE) methods infer richly structured event records from text, based only on a minimal user specification and no training examples, which enables flexibility in exploring and developing applications. Most event extraction research uses the Automatic Content Extraction (ACE) annotated dataset to evaluate supervised EE methods, but can it be used to evaluate zero-shot and other low-supervision EE? We describe ACE’s event structures and identify significant ambiguities and issues in current evaluation practice, including (1) coreferent argument mentions, (2) conflicting argument head conventions, and (3) ignorance of modality and event class details. By sometimes mishandling these subtleties, current work may dramatically understate the actual performance of zero-shot and other low-supervision EE, considering up to 32% of correctly identified arguments and 25% of correctly ignored event mentions as false negatives. For each issue, we propose recommendations for future evaluations so the research community can better utilize ACE as an event evaluation resource.

2022

Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis
Chloe Eggleston | Brendan O’Connor
Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)

In this paper, we utilize recent advancements in social media natural language processing to obtain state-of-the-art syntactic dependency parsing results for social media English. We observe performance gains of 3.4 UAS and 4.0 LAS against the previous state-of-the-art as well as less disparity between African-American and Mainstream American English dialects. We demonstrate the computational social scientific utility of this parser for the task of socially embedded entity attribute analysis: for a specified entity, derive its semantic relationships from parses’ rich syntax, and accumulate and compare them across social variables. We conduct a case study on politicized views of U.S. official Anthony Fauci during the COVID-19 pandemic.

Examining Political Rhetoric with Epistemic Stance Detection
Ankita Gupta | Su Lin Blodgett | Justin H Gross | Brendan O’Connor
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)

Participants in political discourse employ rhetorical strategies—such as hedging, attributions, or denials—to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders—respected allies and opposed bogeymen—across U.S. political ideologies.

Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)
David Bamman | Dirk Hovy | David Jurgens | Katherine Keith | Brendan O'Connor | Svitlana Volkova
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)

Proceedings of the Second Workshop on Bridging Human--Computer Interaction and Natural Language Processing
Su Lin Blodgett | Hal Daumé III | Michael Madaio | Ani Nenkova | Brendan O'Connor | Hanna Wallach | Qian Yang
Proceedings of the Second Workshop on Bridging Human--Computer Interaction and Natural Language Processing

Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties
Tessa Masis | Anissa Neal | Lisa Green | Brendan O’Connor
Proceedings of the First Workshop on NLP applications to field linguistics

The study of language variation examines how language varies between and within different groups of speakers, shedding light on how we use language to construct identities and how social contexts affect language use. A common method is to identify instances of a certain linguistic feature - say, the zero copula construction - in a corpus, and analyze the feature’s distribution across speakers, topics, and other variables, to either gain a qualitative understanding of the feature’s function or systematically measure variation. In this paper, we explore the challenging task of automatic morphosyntactic feature detection in low-resource English varieties. We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits. We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.

2021

Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing
Su Lin Blodgett | Michael Madaio | Brendan O'Connor | Hanna Wallach | Qian Yang
Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing

Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence
Andrew Halterman | Katherine A. Keith | Sheikh Sarwar | Brendan O’Connor
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Text as Causal Mediators: Research Design for Causal Estimates of Differential Treatment of Social Groups via Language Aspects
Katherine A. Keith | Douglas Rice | Brendan O’Connor
Proceedings of the First Workshop on Causal Inference and NLP

Using observed language to understand interpersonal interactions is important in high-stakes decision making. We propose a causal research design for observational (non-experimental) data to estimate the natural direct and indirect effects of social group signals (e.g. race or gender) on speakers’ responses with separate aspects of language as causal mediators. We illustrate the promises and challenges of this framework via a theoretical case study of the effect of an advocate’s gender on interruptions from justices during U.S. Supreme Court oral arguments. We also discuss challenges conceptualizing and operationalizing causal variables such as gender and language that comprise of many components, and we articulate technical open challenges such as temporal dependence between language mediators in conversational settings.

2020

Analyzing Gender Bias within Narrative Tropes
Dhruvil Gala | Mohammad Omar Khursheed | Hannah Lerner | Brendan O’Connor | Mohit Iyyer
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

Popular media reflects and reinforces societal biases through the use of tropes, which are narrative elements, such as archetypal characters and plot arcs, that occur frequently across media. In this paper, we specifically investigate gender bias within a large collection of tropes. To enable our study, we crawl tvtropes.org, an online user-created repository that contains 30K tropes associated with 1.9M examples of their occurrences across film, television, and literature. We automatically score the “genderedness” of each trope in our TVTROPES dataset, which enables an analysis of (1) highly-gendered topics within tropes, (2) the relationship between gender bias and popular reception, and (3) how the gender of a work’s creator correlates with the types of tropes that they use.

Uncertainty over Uncertainty: Investigating the Assumptions, Annotations, and Text Measurements of Economic Policy Uncertainty
Katherine A. Keith | Christoph Teichmann | Brendan O’Connor | Edgar Meij
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

Methods and applications are inextricably linked in science, and in particular in the domain of text-as-data. In this paper, we examine one such text-as-data application, an established economic index that measures economic policy uncertainty from keyword occurrences in news. This index, which is shown to correlate with firm investment, employment, and excess market returns, has had substantive impact in both the private sector and academia. Yet, as we revisit and extend the original authors’ annotations and text measurements we find interesting text-as-data methodological research questions: (1) Are annotator disagreements a reflection of ambiguity in language? (2) Do alternative text measurements correlate with one another and with measures of external predictive validity? We find for this application (1) some annotator disagreements of economic policy uncertainty can be attributed to ambiguity in language, and (2) switching measurements from keyword-matching to supervised machine learning classifiers results in low correlation, a concerning implication for the validity of the index.

Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
David Bamman | Dirk Hovy | David Jurgens | Brendan O'Connor | Svitlana Volkova
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates
Katherine A. Keith | David Jensen | Brendan O’Connor
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Many applications of computational social science aim to infer causal conclusions from non-experimental data. Such observational data often contains confounders, variables that influence both potential causes and potential effects. Unmeasured or latent confounders can bias causal estimates, and this has motivated interest in measuring potential confounders from observed text. For example, an individual’s entire history of social media posts or the content of a news article could provide a rich measurement of multiple confounders. Yet, methods and applications for this problem are scattered across different communities and evaluation practices are inconsistent. This review is the first to gather and categorize these examples and provide a guide to data-processing and evaluation decisions. Despite increased attention on adjusting for confounding using text, there are still many open problems, which we highlight in this paper.

2019

Proceedings of the Society for Computation in Linguistics (SCiL) 2019
Gaja Jarosz | Max Nelson | Brendan O’Connor | Joe Pater
Proceedings of the Society for Computation in Linguistics (SCiL) 2019

Summarizing Relationships for Interactive Concept Map Browsers
Abram Handler | Premkumar Ganeshkumar | Brendan O’Connor | Mohamed AlTantawy
Proceedings of the 2nd Workshop on New Frontiers in Summarization

Concept maps are visual summaries, structured as directed graphs: important concepts from a dataset are displayed as vertexes, and edges between vertexes show natural language descriptions of the relationships between the concepts on the map. Thus far, preliminary attempts at automatically creating concept maps have focused on building static summaries. However, in interactive settings, users will need to dynamically investigate particular relationships between pairs of concepts. For instance, a historian using a concept map browser might decide to investigate the relationship between two politicians in a news archive. We present a model which responds to such queries by returning one or more short, importance-ranked, natural language descriptions of the relationship between two requested concepts, for display in a visual interface. Our model is trained on a new public dataset, collected for this task.

Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts
Jack Merullo | Luke Yeh | Abram Handler | Alvin Grissom II | Brendan O’Connor | Mohit Iyyer
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Sports broadcasters inject drama into play-by-play commentary by building team and player narratives through subjective analyses and anecdotes. Prior studies based on small datasets and manual coding show that such theatrics evince commentator bias in sports broadcasts. To examine this phenomenon, we assemble FOOTBALL, which contains 1,455 broadcast transcripts from American football games across six decades that are automatically annotated with 250K player mentions and linked with racial metadata. We identify major confounding factors for researchers examining racial bias in FOOTBALL, and perform a computational analysis that supports conclusions from prior social science studies.

Query-focused Sentence Compression in Linear Time
Abram Handler | Brendan O’Connor
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Search applications often display shortened sentences which must contain certain query terms and must fit within the space constraints of a user interface. This work introduces a new transition-based sentence compression technique developed for such settings. Our query-focused method constructs length and lexically constrained compressions in linear time, by growing a subgraph in the dependency parse of a sentence. This theoretically efficient approach achieves an 11x empirical speedup over baseline ILP methods, while better reconstructing gold constrained shortenings. Such speedups help query-focused applications, because users are measurably hindered by interface lags. Additionally, our technique does not require an ILP solver or a GPU.

2018

Evaluating Grammaticality in Seq2seq Models with a Broad Coverage HPSG Grammar: A Case Study on Machine Translation
Johnny Wei | Khiem Pham | Brendan O’Connor | Brian Dillon
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Sequence to sequence (seq2seq) models are often employed in settings where the target output is natural language. However, the syntactic properties of the language generated from these models are not well understood. We explore whether such output belongs to a formal and realistic grammar, by employing the English Resource Grammar (ERG), a broad coverage, linguistically precise HPSG-based grammar of English. From a French to English parallel corpus, we analyze the parseability and grammatical constructions occurring in output from a seq2seq translation model. Over 93% of the model translations are parseable, suggesting that it learns to generate conforming to a grammar. The model has trouble learning the distribution of rarer syntactic rules, and we pinpoint several constructions that differentiate translations between the references and our model.

Proceedings of the Society for Computation in Linguistics (SCiL) 2018
Gaja Jarosz | Brendan O’Connor | Joe Pater
Proceedings of the Society for Computation in Linguistics (SCiL) 2018

Twitter Universal Dependency Parsing for African-American and Mainstream American English
Su Lin Blodgett | Johnny Wei | Brendan O’Connor
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Due to the presence of both Twitter-specific conventions and non-standard and dialectal language, Twitter presents a significant parsing challenge to current dependency parsing tools. We broaden English dependency parsing to handle social media English, particularly social media African-American English (AAE), by developing and annotating a new dataset of 500 tweets, 250 of which are in AAE, within the Universal Dependencies 2.0 framework. We describe our standards for handling Twitter- and AAE-specific features and evaluate a variety of cross-domain strategies for improving parsing with no, or very little, in-domain labeled data, including a new data synthesis approach. We analyze these methods’ impact on performance disparities between AAE and Mainstream American English tweets, and assess parsing accuracy for specific AAE lexical and syntactic features. Our annotated data and a parsing model are available at: http://slanglab.cs.umass.edu/TwitterAAE/.

Relational Summarization for Corpus Analysis
Abram Handler | Brendan O’Connor
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

This work introduces a new problem, relational summarization, in which the goal is to generate a natural language summary of the relationship between two lexical items in a corpus, without reference to a knowledge base. Motivated by the needs of novel user interfaces, we define the task and give examples of its application. We also present a new query-focused method for finding natural language sentences which express relationships. Our method allows for summarization of more than two times more query pairs than baseline relation extractors, while returning measurably more readable output. Finally, to help guide future work, we analyze the challenges of relational summarization using both a news and a social media corpus.

Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses
Katherine A. Keith | Su Lin Blodgett | Brendan O’Connor
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Dependency parsing research, which has made significant gains in recent years, typically focuses on improving the accuracy of single-tree predictions. However, ambiguity is inherent to natural language syntax, and communicating such ambiguity is important for error analysis and better-informed downstream applications. In this work, we propose a transition sampling algorithm to sample from the full joint distribution of parse trees defined by a transition-based parsing model, and demonstrate the use of the samples in probabilistic dependency analysis. First, we define the new task of dependency path prediction, inferring syntactic substructures over part of a sentence, and provide the first analysis of performance on this task. Second, we demonstrate the usefulness of our Monte Carlo syntax marginal method for parser error analysis and calibration. Finally, we use this method to propagate parse uncertainty to two downstream information extraction applications: identifying persons killed by police and semantic role assignment.

Uncertainty-aware generative models for inferring document class prevalence
Katherine A. Keith | Brendan O’Connor
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Prevalence estimation is the task of inferring the relative frequency of classes of unlabeled examples in a group—for example, the proportion of a document collection with positive sentiment. Previous work has focused on aggregating and adjusting discriminative individual classifiers to obtain prevalence point estimates. But imperfect classifier accuracy ought to be reflected in uncertainty over the predicted prevalence for scientifically valid inference. In this work, we present (1) a generative probabilistic modeling approach to prevalence estimation, and (2) the construction and evaluation of prevalence confidence intervals; in particular, we demonstrate that an off-the-shelf discriminative classifier can be given a generative re-interpretation, by backing out an implicit individual-level likelihood function, which can be used to conduct fast and simple group-level Bayesian inference. Empirically, we demonstrate our approach provides better confidence interval coverage than an alternative, and is dramatically more robust to shifts in the class prior between training and testing.

2017

A Dataset and Classifier for Recognizing Social Media English
Su Lin Blodgett | Johnny Wei | Brendan O’Connor
Proceedings of the 3rd Workshop on Noisy User-generated Text

While language identification works well on standard texts, it performs much worse on social media language, in particular dialectal language—even for English. First, to support work on English language identification, we contribute a new dataset of tweets annotated for English versus non-English, with attention to ambiguity, code-switching, and automatic generation issues. It is randomly sampled from all public messages, avoiding biases towards pre-existing language classifiers. Second, we find that a demographic language model—which identifies messages with language similar to that used by several U.S. ethnic populations on Twitter—can be used to improve English language identification performance when combined with a traditional supervised language identifier. It increases recall with almost no loss of precision, including, surprisingly, for English messages written by non-U.S. authors. Our dataset and identifier ensemble are available online.

Proceedings of the Second Workshop on NLP and Computational Social Science
Dirk Hovy | Svitlana Volkova | David Bamman | David Jurgens | Brendan O’Connor | Oren Tsur | A. Seza Doğruöz
Proceedings of the Second Workshop on NLP and Computational Social Science

Identifying civilians killed by police with distantly supervised entity-event extraction
Katherine A. Keith | Abram Handler | Michael Pinkham | Cara Magliozzi | Joshua McDuffie | Brendan O’Connor
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We propose a new, socially-impactful task for natural language processing: from a news corpus, extract names of persons who have been killed by police. We present a newly collected police fatality corpus, which we release publicly, and present a model to solve this problem that uses EM-based distant supervision with logistic regression and convolutional neural network classifiers. Our model outperforms two off-the-shelf event extractor systems, and it can suggest candidate victim names in some cases faster than one of the major manually-collected police fatality databases.

2016

Bag of What? Simple Noun Phrase Extraction for Text Analysis
Abram Handler | Matthew Denny | Hanna Wallach | Brendan O’Connor
Proceedings of the First Workshop on NLP and Computational Social Science

Proceedings of the First Workshop on NLP and Computational Social Science
David Bamman | A. Seza Doğruöz | Jacob Eisenstein | Dirk Hovy | David Jurgens | Brendan O’Connor | Alice Oh | Oren Tsur | Svitlana Volkova
Proceedings of the First Workshop on NLP and Computational Social Science

Demographic Dialectal Variation in Social Media: A Case Study of African-American English
Su Lin Blodgett | Lisa Green | Brendan O’Connor
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

Posterior calibration and exploratory analysis for natural language processing models
Khanh Nguyen | Brendan O’Connor
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

MiTextExplorer: Linked brushing and mutual information for exploratory text data analysis
Brendan O’Connor
Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces

CMU: Arc-Factored, Discriminative Semantic Dependency Parsing
Sam Thomson | Brendan O’Connor | Jeffrey Flanigan | David Bamman | Jesse Dodge | Swabha Swayamdipta | Nathan Schneider | Chris Dyer | Noah A. Smith
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

A Framework for (Under)specifying Dependency Syntax without Overloading Annotators
Nathan Schneider | Brendan O’Connor | Naomi Saphra | David Bamman | Manaal Faruqui | Noah A. Smith | Chris Dyer | Jason Baldridge
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

Learning to Extract International Relations from Political Context
Brendan O’Connor | Brandon M. Stewart | Noah A. Smith
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Learning Latent Personas of Film Characters
David Bamman | Brendan O’Connor | Noah A. Smith
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters
Olutobi Owoputi | Brendan O’Connor | Chris Dyer | Kevin Gimpel | Nathan Schneider | Noah A. Smith
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
Kevin Gimpel | Nathan Schneider | Brendan O’Connor | Dipanjan Das | Daniel Mills | Jacob Eisenstein | Michael Heilman | Dani Yogatama | Jeffrey Flanigan | Noah A. Smith
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Predicting a Scientific Community’s Response to an Article
Dani Yogatama | Michael Heilman | Brendan O’Connor | Chris Dyer | Bryan R. Routledge | Noah A. Smith
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

A Latent Variable Model for Geographic Lexical Variation
Jacob Eisenstein | Brendan O’Connor | Noah A. Smith | Eric P. Xing
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2008

Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks
Rion Snow | Brendan O’Connor | Daniel Jurafsky | Andrew Ng
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

Co-authors

David Jurgens 4

Nathan Schneider 4

Svitlana Volkova 4

Jacob Eisenstein 3

Marisa Hudspeth 3

Hanna Wallach 3

A. Seza Doğruöz 2

Jeffrey Flanigan 2

Michael Heilman 2

Michael Madaio 2

Dani Yogatama 2

Mohamed Altantawy 1

Jason Baldridge 1

Patrick J. Burns 1

Hal Daumé III 1

Matthew Denny 1

Brian W. Dillon 1

Chloe Eggleston 1

Manaal Faruqui 1

Premkumar Ganeshkumar 1

Alvin Grissom II 1

Justin H. Gross 1

Andrew Halterman 1

Marzena Karpinska 1

Mohammad Omar Khursheed 1

Kalpesh Krishna 1

Jacquie Kurland 1

Hannah Lerner 1

Cara Magliozzi 1

Joshua McDuffie 1

Daniel P. Mills 1

Olutobi Owoputi 1

Michael Pinkham 1

Bryan R. Routledge 1

Sheikh Sarwar 1

Brandon M. Stewart 1

Swabha Swayamdipta 1

Christoph Teichmann 1

Laure Thompson 1

Ethan Zuckerman 1

Venues