Hal Daumé Iii

Also published as: Hal Daume, Hal Daume III, Hal Daumé, Hal Daumé III


2024

pdf bib
Successfully Guiding Humans with Imperfect Instructions by Highlighting Potential Errors and Suggesting Corrections
Lingjun Zhao | Khanh Xuan Nguyen | Hal Daumé Iii
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Language models will inevitably err in situations with which they are unfamiliar. However, by effectively communicating uncertainties, they can still guide humans toward making sound decisions in those contexts. We demonstrate this idea by developing HEAR, a system that can successfully guide humans in simulated residential environments despite generating potentially inaccurate instructions. Diverging from systems that provide users with only the instructions they generate, HEAR warns users of potential errors in its instructions and suggests corrections. This rich uncertainty information effectively prevents misguidance and reduces the search space for users. Evaluation with 80 users shows that HEAR achieves a 13% increase in success rate and a 29% reduction in final location error distance compared to only presenting instructions to users. Interestingly, we find that offering users possibilities to explore, HEAR motivates them to make more attempts at the task, ultimately leading to a higher success rate. To our best knowledge, this work is the first to show the practical benefits of uncertainty communication in a long-horizon sequential decision-making problem.

pdf bib
Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators through a User-Centric Method
Yang Trista Cao | Lovely-Frances Domingo | Sarah Gilbert | Michelle L. Mazurek | Katie Shilton | Hal Daumé Iii
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Extensive efforts in automated approaches for content moderation have been focused on developing models to identify toxic, offensive, and hateful content with the aim of lightening the load for moderators. Yet, it remains uncertain whether improvements on those tasks have truly addressed moderators’ needs in accomplishing their work. In this paper, we surface gaps between past research efforts that have aimed to provide automation for aspects of content moderation and the needs of volunteer content moderators, regarding identifying violations of various moderation rules. To do so, we conduct a model review on Hugging Face to reveal the availability of models to cover various moderation rules and guidelines from three exemplar forums. We further put state-of-the-art LLMs to the test, evaluating how well these models perform in flagging violations of platform rules from one particular forum. Finally, we conduct a user survey study with volunteer moderators to gain insight into their perspectives on useful moderation models. Overall, we observe a non trivial gap, as missing developed models and LLMs exhibit moderate to low performance on a significant portion of the rules. Moderators’ reports provide guides for future work on developing moderation assistant models.

pdf bib
“You Gotta be a Doctor, Lin” : An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations
Huy Nghiem | John Prindle | Jieyu Zhao | Hal Daumé Iii
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Social science research has shown that candidates with names indicative of certain races or genders often face discrimination in employment practices. Similarly, Large Language Models (LLMs) have demonstrated racial and gender biases in various applications. In this study, we utilize GPT-3.5-Turbo and Llama 3-70B-Instruct to simulate hiring decisions and salary recommendations for candidates with 320 first names that strongly signal their race and gender, across over 750,000 prompts. Our empirical results indicate a preference among these models for hiring candidates with White female-sounding names over other demographic groups across 40 occupations. Additionally, even among candidates with identical qualifications, salary recommendations vary by as much as 5% between different subgroups. A comparison with real-world labor data reveals inconsistent alignment with U.S. labor market characteristics, underscoring the necessity of risk investigation of LLM-powered systems.

pdf bib
ASL STEM Wiki: Dataset and Benchmark for Interpreting STEM Articles
Kayo Yin | Chinmay Singh | Fyodor O Minakov | Vanessa Milan | Hal Daumé Iii | Cyril Zhang | Alex Xijie Lu | Danielle Bragg
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Deaf and hard-of-hearing (DHH) students face significant barriers in accessing science, technology, engineering, and mathematics (STEM) education, notably due to the scarcity of STEM resources in signed languages. To help address this, we introduce ASL STEM Wiki: a parallel corpus of 254 Wikipedia articles on STEM topics in English, interpreted into over 300 hours of American Sign Language (ASL). ASL STEM Wiki is the first continuous signing dataset focused on STEM, facilitating the development of AI resources for STEM education in ASL.We identify several use cases of ASL STEM Wiki with human-centered applications. For example, because this dataset highlights the frequent use of fingerspelling for technical concepts, which inhibits DHH students’ ability to learn,we develop models to identify fingerspelled words—which can later be used to query for appropriate ASL signs to suggest to interpreters.

pdf bib
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA
Maharshi Gor | Hal Daumé Iii | Tianyi Zhou | Jordan Lee Boyd-Graber
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Recent advancements of large language models (LLMs)have led to claims of AI surpassing humansin natural language processing NLP tasks such as textual understanding and reasoning.%This work investigates these assertions by introducingCAIMIRA, a novel framework rooted in item response theory IRTthat enables quantitative assessment and comparison of problem-solving abilities inquestion-answering QA agents.%Through analysis of over 300,000 responses from ~ 70 AI systemsand 155 humans across thousands of quiz questions, CAIMIRA uncovers distinctproficiency patterns in knowledge domains and reasoning skills. %Humans outperform AI systems in knowledge-grounded abductive and conceptual reasoning,while state-of-the-art LLMs like GPT-4 Turbo and Llama-3-70B demonstrate superior performance ontargeted information retrieval and fact-based reasoning, particularly when information gapsare well-defined and addressable through pattern matching or data retrieval.%These findings identify key areas for future QA tasks and model development,highlighting the critical need for questions that not only challengehigher-order reasoning and scientific thinking, but also demand nuanced linguisticand cross-contextual application.

pdf bib
Understanding the Impacts of Language Technologies’ Performance Disparities on African American Language Speakers
Jay Cunningham | Su Lin Blodgett | Michael Madaio | Hal Daumé Iii | Christina Harrington | Hanna Wallach
Findings of the Association for Computational Linguistics: ACL 2024

This paper examines the experiences of African American Language (AAL) speakers when using language technologies. Previous work has used quantitative methods to uncover performance disparities between AAL speakers and White Mainstream English speakers when using language technologies, but has not sought to understand the impacts of these performance disparities on AAL speakers. Through interviews with 19 AAL speakers, we focus on understanding such impacts in a contextualized and human-centered manner. We find that AAL speakers often undertake invisible labor of adapting their speech patterns to successfully use language technologies, and they make connections between failures of language technologies for AAL speakers and a lack of inclusion of AAL speakers in language technology design processes and datasets. Our findings suggest that NLP researchers and practitioners should invest in developing contextualized and human-centered evaluations of language technologies that seek to understand the impacts of performance disparities on speakers of underrepresented languages and language varieties.

pdf bib
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models
Huy Nghiem | Hal Daumé Iii
Findings of the Association for Computational Linguistics: EMNLP 2024

The widespread use of social media necessitates reliable and efficient detection of offensive content to mitigate harmful effects. Although sophisticated models perform well on individual datasets, they often fail to generalize due to varying definitions and labeling of “offensive content.” In this paper, we introduce HateCOT, an English dataset with over 52,000 samples from diverse sources, featuring explanations generated by GPT-3.5Turbo and curated by humans. We demonstrate that pretraining on HateCOT significantly enhances the performance of open-source Large Language Models on three benchmark datasets for offensive content detection in both zero-shot and few-shot settings, despite differences in domain and task. Additionally, HateCOT facilitates effective K-shot fine-tuning of LLMs with limited data and improves the quality of their explanations, as confirmed by our human evaluation.

pdf bib
Large Language Models Help Humans Verify Truthfulness – Except When They Are Convincingly Wrong
Chenglei Si | Navita Goyal | Tongshuang Wu | Chen Zhao | Shi Feng | Hal Daumé Iii | Jordan Boyd-Graber
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large Language Models (LLMs) are increasingly used for accessing information on the web. Their truthfulness and factuality are thus of great interest. To help users make the right decisions about the information they get, LLMs should not only provide information but also help users fact-check it. We conduct human experiments with 80 crowdworkers to compare language models with search engines (information retrieval systems) at facilitating fact-checking. We prompt LLMs to validate a given claim and provide corresponding explanations. Users reading LLM explanations are significantly more efficient than those using search engines while achieving similar accuracy. However, they over-rely on the LLMs when the explanation is wrong. To reduce over-reliance on LLMs, we ask LLMs to provide contrastive information—explain both why the claim is true and false, and then we present both sides of the explanation to users. This contrastive explanation mitigates users’ over-reliance on LLMs, but cannot significantly outperform search engines. Further, showing both search engine results and LLM explanations offers no complementary benefits compared to search engines alone. Taken together, our study highlights that natural language explanations by LLMs may not be a reliable replacement for reading the retrieved passages, especially in high-stakes settings where over-relying on wrong AI explanations could lead to critical consequences.

2023

pdf bib
FairPrism: Evaluating Fairness-Related Harms in Text Generation
Eve Fleisig | Aubrie Amstutz | Chad Atalla | Su Lin Blodgett | Hal Daumé III | Alexandra Olteanu | Emily Sheng | Dan Vann | Hanna Wallach
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

It is critical to measure and mitigate fairness-related harms caused by AI text generation systems, including stereotyping and demeaning harms. To that end, we introduce FairPrism, a dataset of 5,000 examples of AI-generated English text with detailed human annotations covering a diverse set of harms relating to gender and sexuality. FairPrism aims to address several limitations of existing datasets for measuring and mitigating fairness-related harms, including improved transparency, clearer specification of dataset coverage, and accounting for annotator disagreement and harms that are context-dependent. FairPrism’s annotations include the extent of stereotyping and demeaning harms, the demographic groups targeted, and appropriateness for different applications. The annotations also include specific harms that occur in interactive contexts and harms that raise normative concerns when the “speaker” is an AI system. Due to its precision and granularity, FairPrism can be used to diagnose (1) the types of fairness-related harms that AI text generation systems cause, and (2) the potential limitations of mitigation methods, both of which we illustrate through case studies. Finally, the process we followed to develop FairPrism offers a recipe for building improved datasets for measuring and mitigating harms caused by AI systems.

pdf bib
Factual or Contextual? Disentangling Error Types in Entity Description Generation
Navita Goyal | Ani Nenkova | Hal Daumé III
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In the task of entity description generation, given a context and a specified entity, a model must describe that entity correctly and in a contextually-relevant way. In this task, as well as broader language generation tasks, the generation of a nonfactual description (factual error) versus an incongruous description (contextual error) is fundamentally different, yet often conflated. We develop an evaluation paradigm that enables us to disentangle these two types of errors in naturally occurring textual contexts. We find that factuality and congruity are often at odds, and that models specifically struggle with accurate descriptions of entities that are less familiar to people. This shortcoming of language models raises concerns around the trustworthiness of such models, since factual errors on less well-known entities are exactly those that a human reader will not recognize.

pdf bib
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance
Arjun Subramonian | Xingdi Yuan | Hal Daumé III | Su Lin Blodgett
Findings of the Association for Computational Linguistics: ACL 2023

Progress in NLP is increasingly measured through benchmarks; hence, contextualizing progress requires understanding when and why practitioners may disagree about the validity of benchmarks. We develop a taxonomy of disagreement, drawing on tools from measurement modeling, and distinguish between two types of disagreement: 1) how tasks are conceptualized and 2) how measurements of model performance are operationalized. To provide evidence for our taxonomy, we conduct a meta-analysis of relevant literature to understand how NLP tasks are conceptualized, as well as a survey of practitioners about their impressions of different factors that affect benchmark validity. Our meta-analysis and survey across eight tasks, ranging from coreference resolution to question answering, uncover that tasks are generally not clearly and consistently conceptualized and benchmarks suffer from operationalization disagreements. These findings support our proposed taxonomy of disagreement. Finally, based on our taxonomy, we present a framework for constructing benchmarks and documenting their limitations.

pdf bib
Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models
Lingjun Zhao | Khanh Nguyen | Hal Daumé III
Findings of the Association for Computational Linguistics: ACL 2023

Recent work studies the cognitive capabilities of language models through psychological tests designed for humans. While these studies are helpful for understanding the general capabilities of these models, there is no guarantee that a model possessing sufficient capabilities to pass those tests would actually use those capabilities in performing real-life tasks. In this work, we formulate task-oriented cognitive capabilities, which are human-like cognitive capabilities that language models leverage to perform tasks. These capabilities are (i) the ability to quickly generate good candidate utterances (the search capability) (ii) the ability to predict how a listener interprets those utterances and choose the most appropriate one (the pragmatic capability). We design an evaluation scheme for comparing these capabilities of a language model with those of a human. Applying this scheme to examine various models in a navigation instruction generation problem, we find that their pragmatic capability is severely lacking. This insight leads us to augment them with better models of the listener and obtain a significant boost of 11% in success rate in guiding real humans. Our work advocates for having a principled procedure for aligning language models with humans that involves (i) formulating task-oriented capabilities, (ii) devising a method to quantify their deficiency, and (iii) iteratively improving them.

pdf bib
Which Examples Should be Multiply Annotated? Active Learning When Annotators May Disagree
Connor Baumler | Anna Sotnikova | Hal Daumé III
Findings of the Association for Computational Linguistics: ACL 2023

Linguistic annotations, especially for controversial topics like hate speech detection, are frequently contested due to annotator backgrounds and positionalities. In such situations, preserving this disagreement through the machine learning pipeline can be important for downstream use cases. However, capturing disagreement can increase annotation time and expense. Fortunately, for many tasks, not all examples are equally controversial; we develop an active learning approach, Disagreement Aware Active Learning (DAAL) that concentrates annotations on examples where model entropy and annotator entropy are the most different. Because we cannot know the true entropy of annotations on unlabeled examples, we estimate a model that predicts annotator entropy trained using very few multiply-labeled examples. We find that traditional uncertainty-based active learning underperforms simple passive learning on tasks with high levels of disagreement, but that our active learning approach is able to successfully improve on passive and active baselines, reducing the number of annotations required by at least 24% on average across several datasets.

pdf bib
Hallucination Detection for Grounded Instruction Generation
Lingjun Zhao | Khanh Nguyen | Hal Daumé III
Findings of the Association for Computational Linguistics: EMNLP 2023

We investigate the problem of generating instructions to guide humans to navigate in simulated residential environments. A major issue with current models is hallucination: they generate references to actions or objects that are inconsistent with what a human follower would perform or encounter along the described path. We develop a model that detects these hallucinated references by adopting a model pre-trained on a large corpus of image-text pairs, and fine-tuning it with a contrastive loss that separates correct instructions from instructions containing synthesized hallucinations. Our final model outperforms several baselines, including using word probability estimated by the instruction-generation model, and supervised models based on LSTM and Transformer.

pdf bib
What Else Do I Need to Know? The Effect of Background Information on Users’ Reliance on QA Systems
Navita Goyal | Eleftheria Briakou | Amanda Liu | Connor Baumler | Claire Bonial | Jeffrey Micher | Clare Voss | Marine Carpuat | Hal Daumé III
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

NLP systems have shown impressive performance at answering questions by retrieving relevant context. However, with the increasingly large models, it is impossible and often undesirable to constrain models’ knowledge or reasoning to only the retrieved context. This leads to a mismatch between the information that the models access to derive the answer and the information that is available to the user to assess the model predicted answer. In this work, we study how users interact with QA systems in the absence of sufficient information to assess their predictions. Further, we ask whether adding the requisite background helps mitigate users’ over-reliance on predictions. Our study reveals that users rely on model predictions even in the absence of sufficient information needed to assess the model’s correctness. Providing the relevant background, however, helps users better catch model errors, reducing over-reliance on incorrect predictions. On the flip side, background information also increases users’ confidence in their accurate as well as inaccurate judgments. Our work highlights that supporting users’ verification of QA predictions is an important, yet challenging, problem.

pdf bib
A Rose by Any Other Name would not Smell as Sweet: Social Bias in Names Mistranslation
Sandra Sandoval | Jieyu Zhao | Marine Carpuat | Hal Daumé III
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

We ask the question: Are there widespread disparities in machine translations of names across race/ethnicity, and gender? We hypothesize that the translation quality of names and surrounding context will be lower for names associated with US racial and ethnic minorities due to these systems’ tendencies to standardize language to predominant language patterns. We develop a dataset of names that are strongly demographically aligned and propose a translation evaluation procedure based on round-trip translation. We analyze the effect of name demographics on translation quality using generalized linear mixed effects models and find that the ability of translation systems to correctly translate female-associated names is significantly lower than male-associated names. This effect is particularly pronounced for female-associated names that are also associated with racial (Black) and ethnic (Hispanic) minorities. This disparity in translation quality between social groups for something as personal as someone’s name has significant implications for people’s professional, personal, and cultural identities, self-worth and ease of communication. Our findings suggest that more MT research is needed to improve the translation of names and to provide high-quality service for users regardless of gender, race, and ethnicity.

pdf bib
Towards Conceptualization of “Fair Explanation”: Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators
Tin Nguyen | Jiannan Xu | Aayushi Roy | Hal Daumé III | Marine Carpuat
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Recent research at the intersection of AI explainability and fairness has focused on how explanations can improve human-plus-AI task performance as assessed by fairness measures. We propose to characterize what constitutes an explanation that is itself “fair” – an explanation that does not adversely impact specific populations. We formulate a novel evaluation method of “fair explanations” using not just accuracy and label time, but also psychological impact of explanations on different user groups across many metrics (mental discomfort, stereotype activation, and perceived workload). We apply this method in the context of content moderation of potential hate speech, and its differential impact on Asian vs. non-Asian proxy moderators, across explanation approaches (saliency map and counterfactual explanation). We find that saliency maps generally perform better and show less evidence of disparate impact (group) and individual unfairness than counterfactual explanations. Content warning: This paper contains examples of hate speech and racially discriminatory language. The authors do not support such content. Please consider your risk of discomfort carefully before continuing reading!

2022

pdf bib
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou | Su Lin Blodgett | Adam Trischler | Hal Daumé III | Kaheer Suleman | Alexandra Olteanu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

There are many ways to express similar things in text, which makes evaluating natural language generation (NLG) systems difficult. Compounding this difficulty is the need to assess varying quality criteria depending on the deployment setting. While the landscape of NLG evaluation has been well-mapped, practitioners’ goals, assumptions, and constraints—which inform decisions about what, when, and how to evaluate—are often partially or implicitly stated, or not stated at all. Combining a formative semi-structured interview study of NLG practitioners (N=18) with a survey study of a broader sample of practitioners (N=61), we surface goals, community practices, assumptions, and constraints that shape NLG evaluations, examining their implications and how they embody ethical considerations.

pdf bib
Theory-Grounded Measurement of U.S. Social Stereotypes in English Language Models
Yang Trista Cao | Anna Sotnikova | Hal Daumé III | Rachel Rudinger | Linda Zou
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

NLP models trained on text have been shown to reproduce human stereotypes, which can magnify harms to marginalized groups when systems are deployed at scale. We adapt the Agency-Belief-Communion (ABC) stereotype model of Koch et al. (2016) from social psychology as a framework for the systematic study and discovery of stereotypic group-trait associations in language models (LMs). We introduce the sensitivity test (SeT) for measuring stereotypical associations from language models. To evaluate SeT and other measures using the ABC model, we collect group-trait judgments from U.S.-based subjects to compare with English LM stereotypes. Finally, we extend this framework to measure LM stereotyping of intersectional identities.

pdf bib
Proceedings of the Second Workshop on Bridging Human--Computer Interaction and Natural Language Processing
Su Lin Blodgett | Hal Daumé III | Michael Madaio | Ani Nenkova | Brendan O'Connor | Hanna Wallach | Qian Yang
Proceedings of the Second Workshop on Bridging Human--Computer Interaction and Natural Language Processing

pdf bib
What’s Different between Visual Question Answering for Machine “Understanding” Versus for Accessibility?
Yang Trista Cao | Kyle Seelman | Kyungjun Lee | Hal Daumé III
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In visual question answering (VQA), a machine must answer a question given an associated image. Recently, accessibility researchers have explored whether VQA can be deployed in a real-world setting where users with visual impairments learn about their environment by capturing their visual surroundings and asking questions. However, most of the existing benchmarking datasets for VQA focus on machine “understanding” and it remains unclear how progress on those datasets corresponds to improvements in this real-world use case. We aim to answer this question by evaluating discrepancies between machine “understanding” datasets (VQA-v2) and accessibility datasets (VizWiz) by evaluating a variety of VQA models. Based on our findings, we discuss opportunities and challenges in VQA for accessibility and suggest directions for future work.

pdf bib
Heterogeneous Supervised Topic Models
Dhanya Sridhar | Hal Daumé III | David Blei
Transactions of the Association for Computational Linguistics, Volume 10

Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to both uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic model (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black-box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.

2021

pdf bib
Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval
Chen Zhao | Chenyan Xiong | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Complex question answering often requires finding a reasoning chain that consists of multiple evidence pieces. Current approaches incorporate the strengths of structured knowledge and unstructured text, assuming text corpora is semi-structured. Building on dense retrieval methods, we propose a new multi-step retrieval approach (BeamDR) that iteratively forms an evidence chain through beam search in dense representations. When evaluated on multi-hop question answering, BeamDR is competitive to state-of-the-art systems, without using any semi-structured information. Through query composition in dense space, BeamDR captures the implicit relationships between evidence in the reasoning chain. The code is available at https://github.com/henryzhao5852/BeamDR.

pdf bib
Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias Throughout the Machine Learning Lifecycle*
Yang Trista Cao | Hal Daumé III
Computational Linguistics, Volume 47, Issue 3 - November 2021

Correctly resolving textual mentions of people fundamentally entails making inferences about those people. Such inferences raise the risk of systematic biases in coreference resolution systems, including biases that can harm binary and non-binary trans and cis stakeholders. To better understand such biases, we foreground nuanced conceptualizations of gender from sociology and sociolinguistics, and investigate where in the machine learning pipeline such biases can enter a coreference resolution system. We inspect many existing data sets for trans-exclusionary biases, and develop two new data sets for interrogating bias in both crowd annotations and in existing coreference resolution systems. Through these studies, conducted on English text, we confirm that without acknowledging and building systems that recognize the complexity of gender, we will build systems that fail for: quality of service, stereotyping, and over- or under-representation, especially for binary and non-binary trans users.

pdf bib
Analyzing Stereotypes in Generative Text Inference Tasks
Anna Sotnikova | Yang Trista Cao | Hal Daumé III | Rachel Rudinger
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Distantly-Supervised Dense Retrieval Enables Open-Domain Question Answering without Evidence Annotation
Chen Zhao | Chenyan Xiong | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Open-domain question answering answers a question based on evidence retrieved from a large corpus. State-of-the-art neural approaches require intermediate evidence annotations for training. However, such intermediate annotations are expensive, and methods that rely on them cannot transfer to the more common setting, where only question–answer pairs are available. This paper investigates whether models can learn to find evidence from a large corpus, with only distant supervision from answer labels for model training, thereby generating no additional annotation cost. We introduce a novel approach (DistDR) that iteratively improves over a weak retriever by alternately finding evidence from the up-to-date model and encouraging the model to learn the most likely evidence. Without using any evidence labels, DistDR is on par with fully-supervised state-of-the-art methods on both multi-hop and single-hop QA benchmarks. Our analysis confirms that DistDR finds more accurate evidence over iterations, which leads to model improvements. The code is available at https://github.com/henryzhao5852/DistDR.

2020

pdf bib
Meta-Learning for Few-Shot NMT Adaptation
Amr Sharaf | Hany Hassan | Hal Daumé III
Proceedings of the Fourth Workshop on Neural Generation and Translation

We present META-MT, a meta-learning approach to adapt Neural Machine Translation (NMT) systems in a few-shot setting. META-MT provides a new approach to make NMT models easily adaptable to many target do- mains with the minimal amount of in-domain data. We frame the adaptation of NMT systems as a meta-learning problem, where we learn to adapt to new unseen domains based on simulated offline meta-training domain adaptation tasks. We evaluate the proposed meta-learning strategy on ten domains with general large scale NMT systems. We show that META-MT significantly outperforms classical domain adaptation when very few in- domain examples are available. Our experiments shows that META-MT can outperform classical fine-tuning by up to 2.5 BLEU points after seeing only 4, 000 translated words (300 parallel sentences).

pdf bib
Active Imitation Learning with Noisy Guidance
Kianté Brantley | Amr Sharaf | Hal Daumé III
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks by learning near-optimal search policies. Such algorithms assume training-time access to an expert that can provide the optimal action at any queried state; unfortunately, the number of such queries is often prohibitive, frequently rendering these approaches impractical. To combat this query complexity, we consider an active learning setting in which the learning algorithm has additional access to a much cheaper noisy heuristic that provides noisy guidance. Our algorithm, LEAQI, learns a difference classifier that predicts when the expert is likely to disagree with the heuristic, and queries the expert only when necessary. We apply LEAQI to three sequence labelling tasks, demonstrating significantly fewer queries to the expert and comparable (or better) accuracies over a passive approach.

pdf bib
Toward Gender-Inclusive Coreference Resolution
Yang Trista Cao | Hal Daumé III
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Correctly resolving textual mentions of people fundamentally entails making inferences about those people. Such inferences raise the risk of systemic biases in coreference resolution systems, including biases that can harm binary and non-binary trans and cis stakeholders. To better understand such biases, we foreground nuanced conceptualizations of gender from sociology and sociolinguistics, and develop two new datasets for interrogating bias in crowd annotations and in existing coreference resolution systems. Through these studies, conducted on English text, we confirm that without acknowledging and building systems that recognize the complexity of gender, we build systems that lead to many potential harms.

pdf bib
Language (Technology) is Power: A Critical Survey of “Bias” in NLP
Su Lin Blodgett | Solon Barocas | Hal Daumé III | Hanna Wallach
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We survey 146 papers analyzing “bias” in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing “bias” is an inherently normative process. We further find that these papers’ proposed quantitative techniques for measuring or mitigating “bias” are poorly matched to their motivations and do not engage with the relevant literature outside of NLP. Based on these findings, we describe the beginnings of a path forward by proposing three recommendations that should guide work analyzing “bias” in NLP systems. These recommendations rest on a greater recognition of the relationships between language and social hierarchies, encouraging researchers and practitioners to articulate their conceptualizations of “bias”—i.e., what kinds of system behaviors are harmful, in what ways, to whom, and why, as well as the normative reasoning underlying these statements—and to center work around the lived experiences of members of communities affected by NLP systems, while interrogating and reimagining the power relations between technologists and such communities.

pdf bib
On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries
Tianze Shi | Chen Zhao | Jordan Boyd-Graber | Hal Daumé III | Lillian Lee
Findings of the Association for Computational Linguistics: EMNLP 2020

Large-scale semantic parsing datasets annotated with logical forms have enabled major advances in supervised approaches. But can richer supervision help even more? To explore the utility of fine-grained, lexical-level supervision, we introduce SQUALL, a dataset that enriches 11,276 WIKITABLEQUESTIONS English-language questions with manually created SQL equivalents plus alignments between SQL and question fragments. Our annotation enables new training possibilities for encoderdecoder models, including approaches from machine translation previously precluded by the absence of alignments. We propose and test two methods: (1) supervised attention; (2) adopting an auxiliary objective of disambiguating references in the input queries to table columns. In 5-fold cross validation, these strategies improve over strong baselines by 4.4% execution accuracy. Oracle experiments suggest that annotated alignments can support further accuracy gains of up to 23.9%.

2019

pdf bib
Answer-based Adversarial Training for Generating Clarification Questions
Sudha Rao | Hal Daumé III
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We present an approach for generating clarification questions with the goal of eliciting new information that would make the given textual context more complete. We propose that modeling hypothetical answers (to clarification questions) as latent variables can guide our approach into generating more useful clarification questions. We develop a Generative Adversarial Network (GAN) where the generator is a sequence-to-sequence model and the discriminator is a utility function that models the value of updating the context with the answer to the clarification question. We evaluate on two datasets, using both automatic metrics and human judgments of usefulness, specificity and relevance, showing that our approach outperforms both a retrieval-based model and ablations that exclude the utility model and the adversarial training.

pdf bib
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning
Khanh Nguyen | Hal Daumé III
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Mobile agents that can leverage help from humans can potentially accomplish more complex tasks than they could entirely on their own. We develop “Help, Anna!” (HANNA), an interactive photo-realistic simulator in which an agent fulfills object-finding tasks by requesting and interpreting natural language-and-vision assistance. An agent solving tasks in a HANNA environment can leverage simulated human assistants, called ANNA (Automatic Natural Navigation Assistants), which, upon request, provide natural language and visual instructions to direct the agent towards the goals. To address the HANNA problem, we develop a memory-augmented neural agent that hierarchically models multiple levels of decision-making, and an imitation learning algorithm that teaches the agent to avoid repeating past mistakes while simultaneously predicting its own chances of making future progress. Empirically, our approach is able to ask for help more effectively than competitive baselines and, thus, attains higher task success rate on both previously seen and previously unseen environments.

pdf bib
Comparing and Developing Tools to Measure the Readability of Domain-Specific Texts
Elissa Redmiles | Lisa Maszkiewicz | Emily Hwang | Dhruv Kuchhal | Everest Liu | Miraida Morales | Denis Peskov | Sudha Rao | Rock Stevens | Kristina Gligorić | Sean Kross | Michelle Mazurek | Hal Daumé III
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The readability of a digital text can influence people’s ability to learn new things about a range topics from digital resources (e.g., Wikipedia, WebMD). Readability also impacts search rankings, and is used to evaluate the performance of NLP systems. Despite this, we lack a thorough understanding of how to validly measure readability at scale, especially for domain-specific texts. In this work, we present a comparison of the validity of well-known readability measures and introduce a novel approach, Smart Cloze, which is designed to address shortcomings of existing measures. We compare these approaches across four different corpora: crowdworker-generated stories, Wikipedia articles, security and privacy advice, and health information. On these corpora, we evaluate the convergent and content validity of each measure, and detail tradeoffs in score precision, domain-specificity, and participant burden. These results provide a foundation for more accurate readability measurements and better evaluation of new natural-language-processing systems and tools.

pdf bib
Global Voices: Crossing Borders in Automatic News Summarization
Khanh Nguyen | Hal Daumé III
Proceedings of the 2nd Workshop on New Frontiers in Summarization

We construct Global Voices, a multilingual dataset for evaluating cross-lingual summarization methods. We extract social-network descriptions of Global Voices news articles to cheaply collect evaluation data for into-English and from-English summarization in 15 languages. Especially, for the into-English summarization task, we crowd-source a high-quality evaluation dataset based on guidelines that emphasize accuracy, coverage, and understandability. To ensure the quality of this dataset, we collect human ratings to filter out bad summaries, and conduct a survey on humans, which shows that the remaining summaries are preferred over the social-network summaries. We study the effect of translation quality in cross-lingual summarization, comparing a translate-then-summarize approach with several baselines. Our results highlight the limitations of the ROUGE metric that are overlooked in monolingual summarization.

bib
Controlling the Specificity of Clarification Question Generation
Yang Trista Cao | Sudha Rao | Hal Daumé III
Proceedings of the 2019 Workshop on Widening NLP

Unlike comprehension-style questions, clarification questions look for some missing information in a given context. However, without guidance, neural models for question generation, similar to dialog generation models, lead to generic and bland questions that cannot elicit useful information. We argue that controlling the level of specificity of the generated questions can have useful applications and propose a neural clarification question generation model for the same. We first train a classifier that annotates a clarification question with its level of specificity (generic or specific) to the given context. Our results on the Amazon questions dataset demonstrate that training a clarification question generation model on specificity annotated data can generate questions with varied levels of specificity to the given context.

bib
Non-Monotonic Sequential Text Generation
Kiante Brantley | Kyunghyun Cho | Hal Daumé | Sean Welleck
Proceedings of the 2019 Workshop on Widening NLP

Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right. In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation. Our framework operates by generating a word at an arbitrary position, and then recursively generating words to its left and then words to its right, yielding a binary tree. Learning is framed as imitation learning, including a coaching method which moves from imitating an oracle to reinforcing the policy’s own preferences. Experimental results demonstrate that using the proposed method, it is possible to learn policies which generate text without pre-specifying a generation order while achieving competitive performance with conventional left-to-right generation.

2018

pdf bib
Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information
Sudha Rao | Hal Daumé III
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Inquiry is fundamental to communication, and machines cannot effectively collaborate with humans unless they can ask questions. In this work, we build a neural network model for the task of ranking clarification questions. Our model is inspired by the idea of expected value of perfect information: a good question is one whose expected answer will be useful. We study this problem using data from StackExchange, a plentiful online resource in which people routinely ask clarifying questions to posts so that they can better offer assistance to the original poster. We create a dataset of clarification questions consisting of 77K posts paired with a clarification question (and answer) from three domains of StackExchange: askubuntu, unix and superuser. We evaluate our model on 500 samples of this dataset against expert human judgments and demonstrate significant improvements over controlled baselines.

pdf bib
Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings
Han-Chin Shing | Suraj Nair | Ayah Zirikly | Meir Friedenberg | Hal Daumé III | Philip Resnik
Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic

We report on the creation of a dataset for studying assessment of suicide risk via online postings in Reddit. Evaluation of risk-level annotations by experts yields what is, to our knowledge, the first demonstration of reliability in risk assessment by clinicians based on social media postings. We also introduce and demonstrate the value of a new, detailed rubric for assessing suicide risk, compare crowdsourced with expert performance, and present baseline predictive modeling experiments using the new dataset, which will be made available to researchers through the American Association of Suicidology.

pdf bib
Content Selection in Deep Learning Models of Summarization
Chris Kedzie | Kathleen McKeown | Hal Daumé III
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We carry out experiments with deep learning models of summarization across the domains of news, personal stories, meetings, and medical articles in order to understand how content selection is performed. We find that many sophisticated features of state of the art extractive summarizers do not improve performance over simpler models. These results suggest that it is easier to create a summarizer for a new domain than previous work suggests and bring into question the benefit of deep learning models for summarization for those domains that do have massive datasets (i.e., news). At the same time, they suggest important questions for new research in summarization; namely, new forms of sentence representations or external knowledge sources are needed that are better suited to the sumarization task.

2017

pdf bib
Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback
Khanh Nguyen | Hal Daumé III | Jordan Boyd-Graber
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors.

pdf bib
Biomedical Event Extraction using Abstract Meaning Representation
Sudha Rao | Daniel Marcu | Kevin Knight | Hal Daumé III
BioNLP 2017

We propose a novel, Abstract Meaning Representation (AMR) based approach to identifying molecular events/interactions in biomedical text. Our key contributions are: (1) an empirical validation of our hypothesis that an event is a subgraph of the AMR graph, (2) a neural network-based model that identifies such an event subgraph given an AMR, and (3) a distant supervision based approach to gather additional training data. We evaluate our approach on the 2013 Genia Event Extraction dataset and show promising results.

pdf bib
Structured Prediction via Learning to Search under Bandit Feedback
Amr Sharaf | Hal Daumé III
Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing

We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: a pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. We empirically compare a number of different algorithms and exploration methods and show the efficacy of BLS on sequence labeling and dependency parsing tasks.

pdf bib
The UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task
Amr Sharaf | Shi Feng | Khanh Nguyen | Kianté Brantley | Hal Daumé III
Proceedings of the Second Conference on Machine Translation

pdf bib
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems
Emily Bender | Hal Daumé III | Allyson Ettinger | Sudha Rao
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

pdf bib
Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task
Allyson Ettinger | Sudha Rao | Hal Daumé III | Emily M. Bender
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

This paper presents a summary of the first Workshop on Building Linguistically Generalizable Natural Language Processing Systems, and the associated Build It Break It, The Language Edition shared task. The goal of this workshop was to bring together researchers in NLP and linguistics with a carefully designed shared task aimed at testing the generalizability of NLP systems beyond the distributions of their training data. We describe the motivation, setup, and participation of the shared task, provide discussion of some highlighted results, and discuss lessons learned.

2016

pdf bib
CLIP@UMD at SemEval-2016 Task 8: Parser for Abstract Meaning Representation using Learning to Search
Sudha Rao | Yogarshi Vyas | Hal Daumé III | Philip Resnik
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation
He He | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Feuding Families and Former Friends: Unsupervised Learning for Dynamic Fictional Relationships
Mohit Iyyer | Anupam Guha | Snigdha Chaturvedi | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the Workshop on Human-Computer Question Answering
Mohit Iyyer | He He | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the Workshop on Human-Computer Question Answering

pdf bib
The UMD CLPsych 2016 Shared Task System: Text Representation for Predicting Triage of Forum Posts about Mental Health
Meir Friedenberg | Hadi Amiri | Hal Daumé III | Philip Resnik
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib
A Framework for Discriminative Rule Selection in Hierarchical Moses
Fabienne Braune | Alexander Fraser | Hal Daumé III | Aleš Tamchyna
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

pdf bib
Learning Text Pair Similarity with Context-sensitive Autoencoders
Hadi Amiri | Philip Resnik | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Dialogue focus tracking for zero pronoun resolution
Sudha Rao | Allyson Ettinger | Hal Daumé III | Philip Resnik
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Hands-on Learning to Search for Structured Prediction
Hal Daumé III | John Langford | Kai-Wei Chang | He He | Sudha Rao
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib
Syntax-based Rewriting for Simultaneous Machine Translation
He He | Alvin Grissom II | John Morgan | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Why discourse affects speakers’ choice of referring expressions
Naho Orita | Eliana Vornov | Naomi Feldman | Hal Daumé III
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Deep Unordered Composition Rivals Syntactic Methods for Text Classification
Mohit Iyyer | Varun Manjunatha | Jordan Boyd-Graber | Hal Daumé III
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Understanding MOOC Discussion Forums using Seeded LDA
Arti Ramesh | Dan Goldwasser | Bert Huang | Hal Daumé | Lise Getoor
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
A Neural Network for Factoid Question Answering over Paragraphs
Mohit Iyyer | Jordan Boyd-Graber | Leonardo Claudino | Richard Socher | Hal Daumé III
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Don’t Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation
Alvin Grissom II | He He | Jordan Boyd-Graber | John Morgan | Hal Daumé III
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
A Unified Model for Soft Linguistic Reordering Constraints in Statistical Machine Translation
Junhui Li | Yuval Marton | Philip Resnik | Hal Daumé III
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Predicting Instructor’s Intervention in MOOC forums
Snigdha Chaturvedi | Dan Goldwasser | Hal Daumé III
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
I Object!” Modeling Latent Pragmatic Effects in Courtroom Dialogues
Dan Goldwasser | Hal Daumé III
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Monolingual Marginal Matching for Translation Model Adaptation
Ann Irvine | Chris Quirk | Hal Daumé III
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Dynamic Feature Selection for Dependency Parsing
He He | Hal Daumé III | Jason Eisner
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
SenseSpotting: Never let your parallel data tie you to an old domain
Marine Carpuat | Hal Daumé III | Katharine Henry | Ann Irvine | Jagadeesh Jagarlamudi | Rachel Rudinger
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Lucy Vanderwende | Hal Daumé III | Katrin Kirchhoff
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Modeling Syntactic and Semantic Structures in Hierarchical Phrase-based Translation
Junhui Li | Philip Resnik | Hal Daumé III
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Measuring Machine Translation Errors in New Domains
Ann Irvine | John Morgan | Marine Carpuat | Hal Daumé III | Dragos Munteanu
Transactions of the Association for Computational Linguistics, Volume 1

We develop two techniques for analyzing the effect of porting a machine translation system to a new domain. One is a macro-level analysis that measures how domain shift affects corpus-level evaluation; the second is a micro-level analysis for word-level errors. We apply these methods to understand what happens when a Parliament-trained phrase-based machine translation system is applied in four very different domains: news, medical texts, scientific articles and movie subtitles. We present quantitative and qualitative experiments that highlight opportunities for future research in domain adaptation for machine translation.

2012

pdf bib
Low-Dimensional Discriminative Reranking
Jagadeesh Jagarlamudi | Hal Daumé III
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Detecting Visual Text
Jesse Dodge | Amit Goyal | Xufeng Han | Alyssa Mensch | Margaret Mitchell | Karl Stratos | Kota Yamaguchi | Yejin Choi | Hal Daumé III | Alex Berg | Tamara Berg
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the First Workshop on Multilingual Modeling
Jagadeesh Jagarlamudi | Sujith Ravi | Xiaojun Wan | Hal Daume III
Proceedings of the First Workshop on Multilingual Modeling

pdf bib
Incorporating Lexical Priors into Topic Models
Jagadeesh Jagarlamudi | Hal Daumé III | Raghavendra Udupa
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Midge: Generating Image Descriptions From Computer Vision Detections
Margaret Mitchell | Jesse Dodge | Amit Goyal | Kota Yamaguchi | Karl Stratos | Xufeng Han | Alyssa Mensch | Alex Berg | Tamara Berg | Hal Daumé III
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Regularized Interlingual Projections: Evaluation on Multilingual Transliteration
Jagadeesh Jagarlamudi | Hal Daumé III
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Fast Large-Scale Approximate Graph Construction for NLP
Amit Goyal | Hal Daumé III | Raul Guerra
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Sketch Algorithms for Estimating Point Queries in NLP
Amit Goyal | Hal Daumé III | Graham Cormode
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Besting the Quiz Master: Crowdsourcing Incremental Classification Games
Jordan Boyd-Graber | Brianna Satinoff | He He | Hal Daumé III
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Domain Adaptation in Machine Translation: Findings from the 2012 Johns Hopkins Summer Workshop
Hal Daumé III | Marine Carpuat | Alex Fraser | Chris Quirk
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Keynote Presentations

2011

pdf bib
From Bilingual Dictionaries to Interlingual Document Representations
Jagadeesh Jagarlamudi | Hal Daumé III | Raghavendra Udupa
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Domain Adaptation for Machine Translation by Mining Unseen Words
Hal Daumé III | Jagadeesh Jagarlamudi
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Beyond Structured Prediction: Inverse Reinforcement Learning
Hal Daumé III
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib
Generating Semantic Orientation Lexicon using Large Data and Thesaurus
Amit Goyal | Hal Daumé
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

pdf bib
Approximate Scalable Bounded Space Sketch for Large Data NLP
Amit Goyal | Hal Daumé III
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Corpus-Guided Sentence Generation of Natural Images
Yezhou Yang | Ching Teo | Hal Daumé III | Yiannis Aloimonos
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Improving Bilingual Projections via Sparse Covariance Matrices
Jagadeesh Jagarlamudi | Raghavendra Udupa | Hal Daumé III | Abhijit Bhole
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
From Structured Prediction to Inverse Reinforcement Learning
Hal Daumé III
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib
Automatically Producing Plot Unit Representations for Narrative Text
Amit Goyal | Ellen Riloff | Hal Daumé III
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Domain Adaptation meets Active Learning
Piyush Rai | Avishek Saha | Hal Daumé | Suresh Venkatasubramanian
Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing

pdf bib
Toward Plot Units: Automatic Affect State Analysis
Amit Goyal | Ellen Riloff | Hal Daume III | Nathan Gilbert
Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text

pdf bib
Sketching Techniques for Large Scale NLP
Amit Goyal | Jagadeesh Jagarlamudi | Hal Daumé III | Suresh Venkatasubramanian
Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop

pdf bib
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Hal Daumé III | Tejaswini Deoskar | David McClosky | Barbara Plank | Jörg Tiedemann
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

pdf bib
Frustratingly Easy Semi-Supervised Domain Adaptation
Hal Daumé III | Abhishek Kumar | Avishek Saha
Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing

pdf bib
Sketch Techniques for Scaling Distributional Similarity to the Web
Amit Goyal | Jagadeesh Jagarlamudi | Hal Daumé III | Suresh Venkatasubramanian
Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics

2009

pdf bib
Streaming for large scale NLP: Language Modeling
Amit Goyal | Hal Daumé III | Suresh Venkatasubramanian
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Non-Parametric Bayesian Areal Linguistics
Hal Daumé III
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Markov Random Topic Fields
Hal Daumé III
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
Name Translation in Statistical Machine Translation - Learning When to Transliterate
Ulf Hermjakob | Kevin Knight | Hal Daumé III
Proceedings of ACL-08: HLT

pdf bib
Cross-Task Knowledge-Constrained Self Training
Hal Daumé III
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
A Bayesian Model for Discovering Typological Implications
Hal Daumé III | Lyle Campbell
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Frustratingly Easy Domain Adaptation
Hal Daumé III
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Bayesian Query-Focused Summarization
Hal Daumé III | Daniel Marcu
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Beyond EM: Bayesian Techniques for Human Language Technology Researchers
Hal Daume III
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

pdf bib
Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing
Ryan McDonald | Charles Sutton | Hal Daumé III | Andrew McCallum | Fernando Pereira | Jeff Bilmes
Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing

2005

pdf bib
Induction of Word and Phrase Alignments for Automatic Document Summarization
Hal Daumé III | Daniel Marcu
Computational Linguistics, Volume 31, Number 4, December 2005

pdf bib
A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model
Hal Daumé III | Daniel Marcu
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Generic Sentence Fusion is an Ill-Defined Summarization Task
Hal Daume III | Daniel Marcu
Text Summarization Branches Out

pdf bib
A Phrase-Based HMM Approach to Document/Abstract Alignment
Hal Daumé III | Daniel Marcu
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
NP Bracketing by Maximum Entropy Tagging and SVM Reranking
Hal Daumé III | Daniel Marcu
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
Web Search Intent Induction via Automatic Query Reformulation
Hal Daumé III | Eric Brill
Proceedings of HLT-NAACL 2004: Short Papers

2002

pdf bib
The Importance of Lexicalized Syntax Models for Natural Language Generation Tasks
Hal Daume III | Kevin Knight | Irene Langkilde-Geary | Daniel Marcu | Kenji Yamada
Proceedings of the International Natural Language Generation Conference

pdf bib
A Noisy-Channel Model for Document Compression
Hal Daume III | Daniel Marcu
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf bib
Integrated Information Management: An Interactive, Extensible Architecture for Information Retrieval
Eric Nyberg | Hal Daume
Proceedings of the First International Conference on Human Language Technology Research

Search
Co-authors