Claire Cardie

Also published as: C. Cardie


2024

pdf bib
I Could’ve Asked That: Reformulating Unanswerable Questions
Wenting Zhao | Ge Gao | Claire Cardie | Alexander M Rush
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

When seeking information from unfamiliar documents, users frequently pose questions that cannot be answered by the documents. While existing large language models (LLMs) identify these unanswerable questions, they do not assist users in reformulating their questions, thereby reducing their overall utility. We curate CouldAsk, an evaluation benchmark composed of existing and new datasets for document-grounded question answering, specifically designed to study reformulating unanswerable questions. We evaluate state-of-the-art open-source and proprietary LLMs on CouldAsk. The results demonstrate the limited capabilities of these models in reformulating questions. Specifically, GPT-4 and Llama2-7B successfully reformulate questions only 26% and 12% of the time, respectively. Error analysis shows that 62% of the unsuccessful reformulations stem from the models merely rephrasing the questions or even generating identical questions. We publicly release the benchmark and the code to reproduce the experiments.

pdf bib
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild
Yuntian Deng | Wenting Zhao | Jack Hessel | Xiang Ren | Claire Cardie | Yejin Choi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

The increasing availability of real-world conversation data offers exciting opportunities for researchers to study user-chatbot interactions. However, the sheer volume of this data makes manually examining individual conversations impractical. To overcome this challenge, we introduce WildVis, an interactive tool that enables fast, versatile, and large-scale conversation analysis. WildVis provides search and visualization capabilities in the text and embedding spaces based on a list of criteria. To manage million-scale datasets, we implemented optimizations including search index construction, embedding precomputation and compression, and caching to ensure responsive user interactions within seconds. We demonstrate WildVis’ utility through three case studies: facilitating chatbot misuse research, visualizing and comparing topic distributions across datasets, and characterizing user-specific conversation patterns. WildVis is open-source and designed to be extendable, supporting additional datasets and customized search and visualization functionalities.

pdf bib
Adapting Fake News Detection to the Era of Large Language Models
Jinyan Su | Claire Cardie | Preslav Nakov
Findings of the Association for Computational Linguistics: NAACL 2024

In the age of large language models (LLMs) and the widespread adoption of AI-driven content creation, the landscape of information dissemination has witnessed a paradigm shift. With the proliferation of both human-written and machine-generated real and fake news, robustly and effectively discerning the veracity of news articles has become an intricate challenge. While substantial research has been dedicated to fake news detection, it has either assumed that all news articles are human-written or has abruptly assumed that all machine-generated news was fake. Thus, a significant gap exists in understanding the interplay between machine-paraphrased real news, machine-generated fake news, human-written fake news, and human-written real news. In this paper, we study this gap by conducting a comprehensive evaluation of fake news detectors trained in various scenarios. Our primary objectives revolve around the following pivotal question: How can we adapt fake news detectors to the era of LLMs?Our experiments reveal an interesting pattern that detectors trained exclusively on human-written articles can indeed perform well at detecting machine-generated fake news, but not vice versa. Moreover, due to the bias of detectors against machine-generated texts (CITATION), they should be trained on datasets with a lower machine-generated news ratio than the test set. Building on our findings, we provide a practical strategy for the development of robust fake news detectors.

pdf bib
Pungene at DialAM-2024: Identification of Propositional and Illocutionary Relations
Sirawut Chaixanien | Eugene Choi | Shaden Shaar | Claire Cardie
Proceedings of the 11th Workshop on Argument Mining (ArgMining 2024)

In this paper we tackle the shared task DialAM-2024 aiming to annotate dialogue based on the inference anchoring theory (IAT). The task can be split into two parts, identification of propositional relations and identification of illocutionary relations. We propose a pipelined system made up of three parts: (1) locutionary-propositions relation detection, (2) propositional relations detection, and (3) illocutionary relations identification. We fine-tune models independently for each step, and combine at the end for the final system. Our proposed system ranks second overall compared to other participants in the shared task, scoring an average f1-score on both sub-parts of 63.7.

2023

pdf bib
Abductive Commonsense Reasoning Exploiting Mutually Exclusive Explanations
Wenting Zhao | Justin Chiu | Claire Cardie | Alexander Rush
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Abductive reasoning aims to find plausible explanations for an event. This style of reasoning is critical for commonsense tasks where there are often multiple plausible explanations. Existing approaches for abductive reasoning in natural language processing (NLP) often rely on manually generated annotations for supervision; however, such annotations can be subjective and biased. Instead of using direct supervision, this work proposes an approach for abductive commonsense reasoning that exploits the fact that only a subset of explanations is correct for a given context. The method uses posterior regularization to enforce a mutual exclusion constraint, encouraging the model to learn the distinction between fluent explanations and plausible ones. We evaluate our approach on a diverse set of abductive reasoning datasets; experimental results show that our approach outperforms or is comparable to directly applying pretrained language models in a zero-shot manner and other knowledge-augmented zero-shot methods.

pdf bib
Probing Representations for Document-level Event Extraction
Barry Wang | Xinya Du | Claire Cardie
Findings of the Association for Computational Linguistics: EMNLP 2023

The probing classifiers framework has been employed for interpreting deep neural network models for a variety of natural language processing (NLP) applications. Studies, however, have largely focused on sentencelevel NLP tasks. This work is the first to apply the probing paradigm to representations learned for document-level information extraction (IE). We designed eight embedding probes to analyze surface, semantic, and event-understanding capabilities relevant to document-level event extraction. We apply them to the representations acquired by learning models from three different LLM-based document-level IE approaches on a standard dataset. We found that trained encoders from these models yield embeddings that can modestly improve argument detections and labeling but only slightly enhance event-level tasks, albeit trade-offs in information helpful for coherence and event-type prediction. We further found that encoder models struggle with document length and cross-sentence discourse.

pdf bib
Hop, Union, Generate: Explainable Multi-hop Reasoning without Rationale Supervision
Wenting Zhao | Justin Chiu | Claire Cardie | Alexander Rush
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Explainable multi-hop question answering (QA) not only predicts answers but also identifies rationales, i. e. subsets of input sentences used to derive the answers. Existing methods rely on supervision for both answers and rationales. This problem has been extensively studied under the supervised setting, where both answer and rationale annotations are given. Because rationale annotations are expensive to collect and not always available, recent efforts have been devoted to developing methods that do not rely on supervision for rationales. However, such methods have limited capacities in modeling interactions between sentences, let alone reasoning across multiple documents. This work proposes a principled, probabilistic approach for training explainable multi-hop QA systems without rationale supervision. Our approach performs multi-hop reasoning by explicitly modeling rationales as sets, enabling the model to capture interactions between documents and sentences within a document. Experimental results show that our approach is more accurate at selecting rationales than the previous methods, while maintaining similar accuracy in predicting answers.

pdf bib
End-to-end Case-Based Reasoning for Commonsense Knowledge Base Completion
Zonglin Yang | Xinya Du | Erik Cambria | Claire Cardie
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Pretrained language models have been shown to store knowledge in their parameters and have achieved reasonable performance in commonsense knowledge base completion (CKBC) tasks. However, CKBC is knowledge-intensive and it is reported that pretrained language models’ performance in knowledge-intensive tasks are limited because of their incapability of accessing and manipulating knowledge. As a result, we hypothesize that providing retrieved passages that contain relevant knowledge as additional input to the CKBC task will improve performance. In particular, we draw insights from Case-Based Reasoning (CBR) – which aims to solve a new problem by reasoning with retrieved relevant cases, and investigate the direct application of it to CKBC. On two benchmark datasets, we demonstrate through automatic and human evaluations that our End-to-end Case-Based Reasoning Framework (ECBRF) generates more valid, informative, and novel knowledge than the state-of-the-art COMET model for CKBC in both the fully supervised and few-shot settings. We provide insights on why previous retrieval-based methods only achieve merely the same performance with COMET. From the perspective of CBR, our framework addresses a fundamental question on whether CBR methodology can be utilized to improve deep learning models.

2022

pdf bib
Compositional Task-Oriented Parsing as Abstractive Question Answering
Wenting Zhao | Konstantine Arkoudas | Weiqi Sun | Claire Cardie
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Task-oriented parsing (TOP) aims to convert natural language into machine-readable representations of specific tasks, such as setting an alarm. A popular approach to TOP is to apply seq2seq models to generate linearized parse trees. A more recent line of work argues that pretrained seq2seq2 models are better at generating outputs that are themselves natural language, so they replace linearized parse trees with canonical natural-language paraphrases that can then be easily translated into parse trees, resulting in so-called naturalized parsers. In this work we continue to explore naturalized semantic parsing by presenting a general reduction of TOP to abstractive question answering that overcomes some limitations of canonical paraphrasing. Experimental results show that our QA-based technique outperforms state-of-the-art methods in full-data settings while achieving dramatic improvements in few-shot settings.

pdf bib
Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization
Faisal Ladhak | Esin Durmus | He He | Claire Cardie | Kathleen McKeown
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite recent progress in abstractive summarization, systems still suffer from faithfulness errors. While prior work has proposed models that improve faithfulness, it is unclear whether the improvement comes from an increased level of extractiveness of the model outputs as one naive way to improve faithfulness is to make summarization models more extractive. In this work, we present a framework for evaluating the effective faithfulness of summarization systems, by generating a faithfulness-abstractiveness trade-off curve that serves as a control at different operating points on the abstractiveness spectrum. We then show that the Maximum Likelihood Estimation (MLE) baseline as well as recently proposed methods for improving faithfulness, fail to consistently improve over the control at the same level of abstractiveness. Finally, we learn a selector to identify the most faithful and abstractive summary for a given document, and show that this system can attain higher faithfulness scores in human evaluations while being more abstractive than the baseline system on two datasets. Moreover, we show that our system is able to achieve a better faithfulness-abstractiveness trade-off than the control at the same level of abstractiveness.

pdf bib
Automatic Error Analysis for Document-level Information Extraction
Aliva Das | Xinya Du | Barry Wang | Kejian Shi | Jiayuan Gu | Thomas Porter | Claire Cardie
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Document-level information extraction (IE) tasks have recently begun to be revisited in earnest using the end-to-end neural network techniques that have been successful on their sentence-level IE counterparts. Evaluation of the approaches, however, has been limited in a number of dimensions. In particular, the precision/recall/F1 scores typically reported provide few insights on the range of errors the models make. We build on the work of Kummerfeld and Klein (2013) to propose a transformation-based framework for automating error analysis in document-level event and (N-ary) relation extraction. We employ our framework to compare two state-of-the-art document-level template-filling approaches on datasets from three domains; and then, to gauge progress in IE since its inception 30 years ago, vs. four systems from the MUC-4 (1992) evaluation.

pdf bib
Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge
Kai Sun | Dian Yu | Jianshu Chen | Dong Yu | Claire Cardie
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

To perform well on a machine reading comprehension (MRC) task, machine readers usually require commonsense knowledge that is not explicitly mentioned in the given documents. This paper aims to extract a new kind of structured knowledge from scripts and use it to improve MRC. We focus on scripts as they contain rich verbal and nonverbal messages, and two relevant messages originally conveyed by different modalities during a short time period may serve as arguments of a piece of commonsense knowledge as they function together in daily communications. To save human efforts to name relations, we propose to represent relations implicitly by situating such an argument pair in a context and call it contextualized knowledge. To use the extracted knowledge to improve MRC, we compare several fine-tuning strategies to use the weakly-labeled MRC data constructed based on contextualized knowledge and further design a teacher-student paradigm with multiple teachers to facilitate the transfer of knowledge in weakly-labeled MRC data. Experimental results show that our paradigm outperforms other methods that use weakly-labeled data and improves a state-of-the-art baseline by 4.3% in accuracy on a Chinese multiple-choice MRC dataset C3, wherein most of the questions require unstated prior knowledge. We also seek to transfer the knowledge to other tasks by simply adapting the resulting student reader, yielding a 2.9% improvement in F1 on a relation extraction dataset DialogRE, demonstrating the potential usefulness of the knowledge for non-MRC tasks that require document comprehension.

pdf bib
BeSt: The Belief and Sentiment Corpus
Jennifer Tracey | Owen Rambow | Claire Cardie | Adam Dalton | Hoa Trang Dang | Mona Diab | Bonnie Dorr | Louise Guthrie | Magdalena Markowska | Smaranda Muresan | Vinodkumar Prabhakaran | Samira Shaikh | Tomek Strzalkowski
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present the BeSt corpus, which records cognitive state: who believes what (i.e., factuality), and who has what sentiment towards what. This corpus is inspired by similar source-and-target corpora, specifically MPQA and FactBank. The corpus comprises two genres, newswire and discussion forums, in three languages, Chinese (Mandarin), English, and Spanish. The corpus is distributed through the LDC.

2021

pdf bib
Template Filling with Generative Transformers
Xinya Du | Alexander Rush | Claire Cardie
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Template filling is generally tackled by a pipeline of two separate supervised systems – one for role-filler extraction and another for template/event recognition. Since pipelines consider events in isolation, they can suffer from error propagation. We introduce a framework based on end-to-end generative transformers for this task (i.e., GTT). It naturally models the dependence between entities both within a single event and across the multiple events described in a document. Experiments demonstrate that this framework substantially outperforms pipeline-based approaches, and other neural end-to-end baselines that do not model between-event dependencies. We further show that our framework specifically improves performance on documents containing multiple events.

pdf bib
Adding Chit-Chat to Enhance Task-Oriented Dialogues
Kai Sun | Seungwhan Moon | Paul Crook | Stephen Roller | Becka Silvert | Bing Liu | Zhiguang Wang | Honglei Liu | Eunjoon Cho | Claire Cardie
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Existing dialogue corpora and models are typically designed under two disjoint motives: while task-oriented systems focus on achieving functional goals (e.g., booking hotels), open-domain chatbots aim at making socially engaging conversations. In this work, we propose to integrate both types of systems by Adding Chit-Chat to ENhance Task-ORiented dialogues (ACCENTOR), with the goal of making virtual assistant conversations more engaging and interactive. Specifically, we propose a Human <-> AI collaborative data collection approach for generating diverse chit-chat responses to augment task-oriented dialogues with minimal annotation effort. We then present our new chit-chat-based annotations to 23.8K dialogues from two popular task-oriented datasets (Schema-Guided Dialogue and MultiWOZ 2.1) and demonstrate their advantage over the originals via human evaluation. Lastly, we propose three new models for adding chit-chat to task-oriented dialogues, explicitly trained to predict user goals and to generate contextually relevant chit-chat responses. Automatic and human evaluations show that, compared with the state-of-the-art task-oriented baseline, our models can code-switch between task and chit-chat to be more engaging, interesting, knowledgeable, and humanlike, while maintaining competitive task performance.

pdf bib
GRIT: Generative Role-filler Transformers for Document-level Event Entity Extraction
Xinya Du | Alexander Rush | Claire Cardie
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

We revisit the classic problem of document-level role-filler entity extraction (REE) for template filling. We argue that sentence-level approaches are ill-suited to the task and introduce a generative transformer-based encoder-decoder framework (GRIT) that is designed to model context at the document level: it can make extraction decisions across sentence boundaries; is implicitly aware of noun phrase coreference structure, and has the capacity to respect cross-role dependencies in the template structure. We evaluate our approach on the MUC-4 dataset, and show that our model performs substantially better than prior work. We also show that our modeling choices contribute to model performance, e.g., by implicitly capturing linguistic knowledge such as recognizing coreferent entity mentions.

pdf bib
Leveraging Topic Relatedness for Argument Persuasion
Xinran Zhao | Esin Durmus | Hongming Zhang | Claire Cardie
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data
Dian Yu | Kai Sun | Dong Yu | Claire Cardie
Findings of the Association for Computational Linguistics: EMNLP 2021

Despite considerable progress, most machine reading comprehension (MRC) tasks still lack sufficient training data to fully exploit powerful deep neural network models with millions of parameters, and it is laborious, expensive, and time-consuming to create large-scale, high-quality MRC data through crowdsourcing. This paper focuses on generating more training data for MRC tasks by leveraging existing question-answering (QA) data. We first collect a large-scale multi-subject multiple-choice QA dataset for Chinese, ExamQA. We next use incomplete, yet relevant snippets returned by a web search engine as the context for each QA instance to convert it into a weakly-labeled MRC instance. To better use the weakly-labeled data to improve a target MRC task, we evaluate and compare several methods and further propose a self-teaching paradigm. Experimental results show that, upon state-of-the-art MRC baselines, we can obtain +5.1% in accuracy on a multiple-choice Chinese MRC dataset, Cˆ3, and +3.8% in exact match on an extractive Chinese MRC dataset, CMRC 2018, demonstrating the usefulness of the generated QA-based weakly-labeled data for different types of MRC tasks as well as the effectiveness of self-teaching. ExamQA will be available at https://dataset.org/examqa/.

pdf bib
When in Doubt: Improving Classification Performance with Alternating Normalization
Menglin Jia | Austin Reiter | Ser-Nam Lim | Yoav Artzi | Claire Cardie
Findings of the Association for Computational Linguistics: EMNLP 2021

We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification. CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution using the predicted class distributions of high-confidence validation examples. CAN is easily applicable to any probabilistic classifier, with minimal computation overhead. We analyze the properties of CAN using simulated experiments, and empirically demonstrate its effectiveness across a diverse set of classification tasks.

2020

pdf bib
Leveraging Structured Metadata for Improving Question Answering on the Web
Xinya Du | Ahmed Hassan Awadallah | Adam Fourney | Robert Sim | Paul Bennett | Claire Cardie
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

We show that leveraging metadata information from web pages can improve the performance of models for answer passage selection/reranking. We propose a neural passage selection model that leverages metadata information with a fine-grained encoding strategy, which learns the representation for metadata predicates in a hierarchical way. The models are evaluated on the MS MARCO (Nguyen et al., 2016) and Recipe-MARCO datasets. Results show that our models significantly outperform baseline models, which do not incorporate metadata. We also show that the fine-grained encoding’s advantage over other strategies for encoding the metadata.

pdf bib
Interpreting Pretrained Contextualized Representations via Reductions to Static Embeddings
Rishi Bommasani | Kelly Davis | Claire Cardie
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Contextualized representations (e.g. ELMo, BERT) have become the default pretrained representations for downstream NLP applications. In some settings, this transition has rendered their static embedding predecessors (e.g. Word2Vec, GloVe) obsolete. As a side-effect, we observe that older interpretability methods for static embeddings — while more diverse and mature than those available for their dynamic counterparts — are underutilized in studying newer contextualized representations. Consequently, we introduce simple and fully general methods for converting from contextualized representations to static lookup-table embeddings which we apply to 5 popular pretrained models and 9 sets of pretrained weights. Our analysis of the resulting static embeddings notably reveals that pooling over many contexts significantly improves representational quality under intrinsic evaluation. Complementary to analyzing representational quality, we consider social biases encoded in pretrained representations with respect to gender, race/ethnicity, and religion and find that bias is encoded disparately across pretrained models and internal layers even for models with the same training data. Concerningly, we find dramatic inconsistencies between social bias estimators for word embeddings.

pdf bib
Dialogue-Based Relation Extraction
Dian Yu | Kai Sun | Claire Cardie | Dong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE, aiming to support the prediction of relation(s) between two arguments that appear in a dialogue. We further offer DialogRE as a platform for studying cross-sentence RE as most facts span multiple sentences. We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks. Considering the timeliness of communication in a dialogue, we design a new metric to evaluate the performance of RE methods in a conversational setting and investigate the performance of several representative RE methods on DialogRE. Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings. DialogRE is available at https://dataset.org/dialogre/.

pdf bib
Document-Level Event Role Filler Extraction using Multi-Granularity Contextualized Encoding
Xinya Du | Claire Cardie
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Few works in the literature of event extraction have gone beyond individual sentences to make extraction decisions. This is problematic when the information needed to recognize an event argument is spread across multiple sentences. We argue that document-level event extraction is a difficult task since it requires a view of a larger context to determine which spans of text correspond to event role fillers. We first investigate how end-to-end neural sequence models (with pre-trained language model representations) perform on document-level role filler extraction, as well as how the length of context captured affects the models’ performance. To dynamically aggregate information captured by neural representations learned at different levels of granularity (e.g., the sentence- and paragraph-level), we propose a novel multi-granularity reader. We evaluate our models on the MUC-4 event extraction dataset, and show that our best system performs substantially better than prior work. We also report findings on the relationship between context length and neural model performance on the task.

pdf bib
Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension
Kai Sun | Dian Yu | Dong Yu | Claire Cardie
Transactions of the Association for Computational Linguistics, Volume 8

Machine reading comprehension tasks require a machine reader to answer questions relevant to the given document. In this paper, we present the first free-form multiple-Choice Chinese machine reading Comprehension dataset (C3), containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second-language examinations. We present a comprehensive analysis of the prior knowledge (i.e., linguistic, domain-specific, and general world knowledge) needed for these real-world problems. We implement rule-based and popular neural methods and find that there is still a significant performance gap between the best performing model (68.5%) and human readers (96.0%), especiallyon problems that require prior knowledge. We further study the effects of distractor plausibility and data augmentation based on translated relevant datasets for English on model performance. We expect C3 to present great challenges to existing systems as answering 86.8% of questions requires both knowledge within and beyond the accompanying document, and we hope that C3 can serve as a platform to study how to leverage various kinds of prior knowledge to better understand a given written or orally oriented text. C3 is available at https://dataset.org/c3/.

pdf bib
SUMSUM@FNS-2020 Shared Task
Siyan Zheng | Anneliese Lu | Claire Cardie
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

This paper describes the SUMSUM systems submitted to the Financial Narrative Summarization Shared Task (FNS-2020). We explore a section-based extractive summarization method tailored to the structure of financial reports: our best system parses the report Table of Contents (ToC), splits the report into narrative sections based on the ToC, and applies a BERT-based classifier to each section to determine whether it should be included in the summary. Our best system ranks 4th, 1st, 2nd and 17th on the Rouge-1, Rouge-2, Rouge-SU4, and Rouge-L official metrics, respectively. We also report results on the validation set using an alternative set of Rouge-based metrics that measure performance with respect to the best-matching of the available gold summaries.

pdf bib
Improving Event Duration Prediction via Time-aware Pre-training
Zonglin Yang | Xinya Du | Alexander Rush | Claire Cardie
Findings of the Association for Computational Linguistics: EMNLP 2020

End-to-end models in NLP rarely encode external world knowledge about length of time. We introduce two effective models for duration prediction, which incorporate external knowledge by reading temporal-related news sentences (time-aware pre-training). Specifically, one model predicts the range/unit where the duration value falls in (R-PRED); and the other predicts the exact duration value (E-PRED). Our best model – E-PRED, substantially outperforms previous work, and captures duration information more accurately than R-PRED. We also demonstrate our models are capable of duration prediction in the unsupervised setting, outperforming the baselines.

pdf bib
WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization
Faisal Ladhak | Esin Durmus | Claire Cardie | Kathleen McKeown
Findings of the Association for Computational Linguistics: EMNLP 2020

We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of cross-lingual abstractive summarization systems. We extract article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors. We create gold-standard article-summary alignments across languages by aligning the images that are used to describe each how-to step in an article. As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our dataset. We further propose a method for direct cross-lingual summarization (i.e., without requiring translation at inference time) by leveraging synthetic data and Neural Machine Translation as a pre-training step. Our method significantly outperforms the baseline approaches, while being more cost efficient during inference.

pdf bib
Event Extraction by Answering (Almost) Natural Questions
Xinya Du | Claire Cardie
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The problem of event extraction requires detecting the event trigger and extracting its corresponding arguments. Existing work in event argument extraction typically relies heavily on entity recognition as a preprocessing/concurrent step, causing the well-known problem of error propagation. To avoid this issue, we introduce a new paradigm for event extraction by formulating it as a question answering (QA) task that extracts the event arguments in an end-to-end manner. Empirical results demonstrate that our framework outperforms prior methods substantially; in addition, it is capable of extracting event arguments for roles not seen at training time (i.e., in a zero-shot learning setting).

pdf bib
Intrinsic Evaluation of Summarization Datasets
Rishi Bommasani | Claire Cardie
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

High quality data forms the bedrock for building meaningful statistical models in NLP. Consequently, data quality must be evaluated either during dataset construction or *post hoc*. Almost all popular summarization datasets are drawn from natural sources and do not come with inherent quality assurance guarantees. In spite of this, data quality has gone largely unquestioned for many of these recent datasets. We perform the first large-scale evaluation of summarization datasets by introducing 5 intrinsic metrics and applying them to 10 popular datasets. We find that data usage in recent summarization research is sometimes inconsistent with the underlying properties of the data. Further, we discover that our metrics can serve the additional purpose of being inexpensive heuristics for detecting generically low quality examples.

pdf bib
Exploring the Role of Argument Structure in Online Debate Persuasion
Jialu Li | Esin Durmus | Claire Cardie
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Online debate forums provide users a platform to express their opinions on controversial topics while being exposed to opinions from diverse set of viewpoints. Existing work in Natural Language Processing (NLP) has shown that linguistic features extracted from the debate text and features encoding the characteristics of the audience are both critical in persuasion studies. In this paper, we aim to further investigate the role of discourse structure of the arguments from online debates in their persuasiveness. In particular, we use the factor graph model to obtain features for the argument structure of debates from an online debating platform and incorporate these features to an LSTM-based model to predict the debater that makes the most convincing arguments. We find that incorporating argument structure features play an essential role in achieving the best predictive performance in assessing the persuasiveness of the arguments on online debates.

2019

pdf bib
A Corpus for Modeling User and Language Effects in Argumentation on Online Debating
Esin Durmus | Claire Cardie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Existing argumentation datasets have succeeded in allowing researchers to develop computational methods for analyzing the content, structure and linguistic features of argumentative text. They have been much less successful in fostering studies of the effect of “user” traits — characteristics and beliefs of the participants — on the debate/argument outcome as this type of user information is generally not available. This paper presents a dataset of 78,376 debates generated over a 10-year period along with surprisingly comprehensive participant profiles. We also complete an example study using the dataset to analyze the effect of selected user traits on the debate outcome in comparison to the linguistic features typically employed in studies of this kind.

pdf bib
Multi-Source Cross-Lingual Model Transfer: Learning What to Share
Xilun Chen | Ahmed Hassan Awadallah | Hany Hassan | Wei Wang | Claire Cardie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Modern NLP applications have enjoyed a great boost utilizing neural networks models. Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks. Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages. In this work, we focus on the multilingual transfer setting where training data in multiple source languages is leveraged to further boost target language performance. Unlike most existing methods that rely only on language-invariant features for CLTL, our approach coherently utilizes both language-invariant and language-specific features at instance level. Our model leverages adversarial networks to learn language-invariant features, and mixture-of-experts models to dynamically exploit the similarity between the target language and each individual source language. This enables our model to learn effectively what to share between various languages in the multilingual setup. Moreover, when coupled with unsupervised multilingual embeddings, our model can operate in a zero-resource setting where neither target language training data nor cross-lingual resources are available. Our model achieves significant performance gains over prior art, as shown in an extensive set of experiments over multiple text classification and sequence tagging tasks including a large-scale industry dataset.

pdf bib
Keeping Notes: Conditional Natural Language Generation with a Scratchpad Encoder
Ryan Benmalek | Madian Khabsa | Suma Desu | Claire Cardie | Michele Banko
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We introduce the Scratchpad Mechanism, a novel addition to the sequence-to-sequence (seq2seq) neural network architecture and demonstrate its effectiveness in improving the overall fluency of seq2seq models for natural language generation tasks. By enabling the decoder at each time step to write to all of the encoder output layers, Scratchpad can employ the encoder as a “scratchpad” memory to keep track of what has been generated so far and thereby guide future generation. We evaluate Scratchpad in the context of three well-studied natural language generation tasks — Machine Translation, Question Generation, and Text Summarization — and obtain state-of-the-art or comparable performance on standard datasets for each task. Qualitative assessments in the form of human judgements (question generation), attention visualization (MT), and sample output (summarization) provide further evidence of the ability of Scratchpad to generate fluent and expressive output.

pdf bib
Determining Relative Argument Specificity and Stance for Complex Argumentative Structures
Esin Durmus | Faisal Ladhak | Claire Cardie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Systems for automatic argument generation and debate require the ability to (1) determine the stance of any claims employed in the argument and (2) assess the specificity of each claim relative to the argument context. Existing work on understanding claim specificity and stance, however, has been limited to the study of argumentative structures that are relatively shallow, most often consisting of a single claim that directly supports or opposes the argument thesis. In this paper, we tackle these tasks in the context of complex arguments on a diverse set of topics. In particular, our dataset consists of manually curated argument trees for 741 controversial topics covering 95,312 unique claims; lines of argument are generally of depth 2 to 6. We find that as the distance between a pair of claims increases along the argument path, determining the relative specificity of a pair of claims becomes easier and determining their relative stance becomes harder.

pdf bib
DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension
Kai Sun | Dian Yu | Jianshu Chen | Dong Yu | Yejin Choi | Claire Cardie
Transactions of the Association for Computational Linguistics, Volume 7

We present DREAM, the first dialogue-based multiple-choice reading comprehension data set. Collected from English as a Foreign Language examinations designed by human experts to evaluate the comprehension level of Chinese learners of English, our data set contains 10,197 multiple-choice questions for 6,444 dialogues. In contrast to existing reading comprehension data sets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding. DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge. We apply several popular neural reading comprehension models that primarily exploit surface information within the text and find them to, at best, just barely outperform a rule-based approach. We next investigate the effects of incorporating dialogue structure and different kinds of general world knowledge into both rule-based and (neural and non-neural) machine learning-based reading comprehension models. Experimental results on the DREAM data set show the effectiveness of dialogue structure and general world knowledge. DREAM is available at https://dataset.org/dream/.

pdf bib
Be Consistent! Improving Procedural Text Comprehension using Label Consistency
Xinya Du | Bhavana Dalvi | Niket Tandon | Antoine Bosselut | Wen-tau Yih | Peter Clark | Claire Cardie
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Our goal is procedural text comprehension, namely tracking how the properties of entities (e.g., their location) change with time given a procedural text (e.g., a paragraph about photosynthesis, a recipe). This task is challenging as the world is changing throughout the text, and despite recent advances, current systems still struggle with this task. Our approach is to leverage the fact that, for many procedural texts, multiple independent descriptions are readily available, and that predictions from them should be consistent (label consistency). We present a new learning framework that leverages label consistency during training, allowing consistency bias to be built into the model. Evaluation on a standard benchmark dataset for procedural text, ProPara (Dalvi et al., 2018), shows that our approach significantly improves prediction performance (F1) over prior state-of-the-art systems.

pdf bib
Improving Machine Reading Comprehension with General Reading Strategies
Kai Sun | Dian Yu | Dong Yu | Claire Cardie
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Reading strategies have been shown to improve comprehension levels, especially for readers lacking adequate prior knowledge. Just as the process of knowledge accumulation is time-consuming for human readers, it is resource-demanding to impart rich general domain knowledge into a deep language model via pre-training. Inspired by reading strategies identified in cognitive science, and given limited computational resources - just a pre-trained model and a fixed number of training instances - we propose three general strategies aimed to improve non-extractive machine reading comprehension (MRC): (i) BACK AND FORTH READING that considers both the original and reverse order of an input sequence, (ii) HIGHLIGHTING, which adds a trainable embedding to the text embedding of tokens that are relevant to the question and candidate answers, and (iii) SELF-ASSESSMENT that generates practice questions and candidate answers directly from the text in an unsupervised manner. By fine-tuning a pre-trained language model (Radford et al., 2018) with our proposed strategies on the largest general domain multiple-choice MRC dataset RACE, we obtain a 5.8% absolute increase in accuracy over the previous best result achieved by the same pre-trained model fine-tuned on RACE without the use of strategies. We further fine-tune the resulting model on a target MRC task, leading to an absolute improvement of 6.2% in average accuracy over previous state-of-the-art approaches on six representative non-extractive MRC datasets from different domains (i.e., ARC, OpenBookQA, MCTest, SemEval-2018 Task 11, ROCStories, and MultiRC). These results demonstrate the effectiveness of our proposed strategies and the versatility and general applicability of our fine-tuned models that incorporate these strategies. Core code is available at https://github.com/nlpdata/strategy/.

pdf bib
The Role of Pragmatic and Discourse Context in Determining Argument Impact
Esin Durmus | Faisal Ladhak | Claire Cardie
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Research in the social sciences and psychology has shown that the persuasiveness of an argument depends not only the language employed, but also on attributes of the source/communicator, the audience, and the appropriateness and strength of the argument’s claims given the pragmatic and discourse context of the argument. Among these characteristics of persuasive arguments, prior work in NLP does not explicitly investigate the effect of the pragmatic and discourse context when determining argument quality. This paper presents a new dataset to initiate the study of this aspect of argumentation: it consists of a diverse collection of arguments covering 741 controversial topics and comprising over 47,000 claims. We further propose predictive models that incorporate the pragmatic and discourse context of argumentative claims and show that they outperform models that rely only on claim-specific linguistic features for predicting the perceived impact of individual claims within a particular line of argument.

pdf bib
Improving Question Answering with External Knowledge
Xiaoman Pan | Kai Sun | Dian Yu | Jianshu Chen | Heng Ji | Claire Cardie | Dong Yu
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

We focus on multiple-choice question answering (QA) tasks in subject areas such as science, where we require both broad background knowledge and the facts from the given subject-area reference corpus. In this work, we explore simple yet effective methods for exploiting two sources of external knowledge for subject-area QA. The first enriches the original subject-area reference corpus with relevant text snippets extracted from an open-domain resource (i.e., Wikipedia) that cover potentially ambiguous concepts in the question and answer options. As in other QA research, the second method simply increases the amount of training data by appending additional in-domain subject-area instances. Experiments on three challenging multiple-choice science QA tasks (i.e., ARC-Easy, ARC-Challenge, and OpenBookQA) demonstrate the effectiveness of our methods: in comparison to the previous state-of-the-art, we obtain absolute gains in accuracy of up to 8.1%, 13.0%, and 12.8%, respectively. While we observe consistent gains when we introduce knowledge from Wikipedia, we find that employing additional QA training instances is not uniformly helpful: performance degrades when the added instances exhibit a higher level of difficulty than the original training data. As one of the first studies on exploiting unstructured external knowledge for subject-area QA, we hope our methods, observations, and discussion of the exposed limitations may shed light on further developments in the area.

pdf bib
SPARSE: Structured Prediction using Argument-Relative Structured Encoding
Rishi Bommasani | Arzoo Katiyar | Claire Cardie
Proceedings of the Third Workshop on Structured Prediction for NLP

We propose structured encoding as a novel approach to learning representations for relations and events in neural structured prediction. Our approach explicitly leverages the structure of available relation and event metadata to generate these representations, which are parameterized by both the attribute structure of the metadata as well as the learned representation of the arguments of the relations and events. We consider affine, biaffine, and recurrent operators for building hierarchical representations and modelling underlying features. We apply our approach to the second-order structured prediction task studied in the 2016/2017 Belief and Sentiment analysis evaluations (BeSt): given a document and its entities, relations, and events (including metadata and mentions), determine the sentiment of each entity towards every relation and event in the document. Without task-specific knowledge sources or domain engineering, we significantly improve over systems and baselines that neglect the available metadata or its hierarchical structure. We observe across-the-board improvements on the BeSt 2016/2017 sentiment analysis task of at least 2.3 (absolute) and 10.6% (relative) F-measure over the previous state-of-the-art.

pdf bib
Persuasion of the Undecided: Language vs. the Listener
Liane Longpre | Esin Durmus | Claire Cardie
Proceedings of the 6th Workshop on Argument Mining

This paper examines the factors that govern persuasion for a priori UNDECIDED versus DECIDED audience members in the context of on-line debates. We separately study two types of influences: linguistic factors — features of the language of the debate itself; and audience factors — features of an audience member encoding demographic information, prior beliefs, and debate platform behavior. In a study of users of a popular debate platform, we find first that different combinations of linguistic features are critical for predicting persuasion outcomes for UNDECIDED versus DECIDED members of the audience. We additionally find that audience factors have more influence on predicting the side (PRO/CON) that persuaded UNDECIDED users than for DECIDED users that flip their stance to the opposing side. Our results emphasize the importance of considering the undecided and decided audiences separately when studying linguistic factors of persuasion.

2018

pdf bib
Nested Named Entity Recognition Revisited
Arzoo Katiyar | Claire Cardie
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We propose a novel recurrent neural network-based approach to simultaneously handle nested named entity recognition and nested entity mention detection. The model learns a hypergraph representation for nested entities using features extracted from a recurrent neural network. In evaluations on three standard data sets, we show that our approach significantly outperforms existing state-of-the-art methods, which are feature-based. The approach is also efficient: it operates linearly in the number of tokens and the number of possible output labels at any token. Finally, we present an extension of our model that jointly learns the head of each entity mention.

pdf bib
Exploring the Role of Prior Beliefs for Argument Persuasion
Esin Durmus | Claire Cardie
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Public debate forums provide a common platform for exchanging opinions on a topic of interest. While recent studies in natural language processing (NLP) have provided empirical evidence that the language of the debaters and their patterns of interaction play a key role in changing the mind of a reader, research in psychology has shown that prior beliefs can affect our interpretation of an argument and could therefore constitute a competing alternative explanation for resistance to changing one’s stance. To study the actual effect of language use vs. prior beliefs on persuasion, we provide a new dataset and propose a controlled setting that takes into consideration two reader-level factors: political and religious ideology. We find that prior beliefs affected by these reader-level factors play a more important role than language use effects and argue that it is important to account for them in NLP studies of persuasion.

pdf bib
Multinomial Adversarial Networks for Multi-Domain Text Classification
Xilun Chen | Claire Cardie
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Many text classification tasks are known to be highly domain-dependent. Unfortunately, the availability of training data can vary drastically across domains. Worse still, for some domains there may not be any annotated data at all. In this work, we propose a multinomial adversarial network (MAN) to tackle this real-world problem of multi-domain text classification (MDTC) in which labeled data may exist for multiple domains, but in insufficient amounts to train effective classifiers for one or more of the domains. We provide theoretical justifications for the MAN framework, proving that different instances of MANs are essentially minimizers of various f-divergence metrics (Ali and Silvey, 1966) among multiple probability distributions. MANs are thus a theoretically sound generalization of traditional adversarial networks that discriminate over two distributions. More specifically, for the MDTC task, MAN learns features that are invariant across multiple domains by resorting to its ability to reduce the divergence among the feature distributions of each domain. We present experimental results showing that MANs significantly outperform the prior art on the MDTC task. We also show that MANs achieve state-of-the-art performance for domains with no labeled data.

pdf bib
Harvesting Paragraph-level Question-Answer Pairs from Wikipedia
Xinya Du | Claire Cardie
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study the task of generating from Wikipedia articles question-answer pairs that cover content beyond a single sentence. We propose a neural network approach that incorporates coreference knowledge via a novel gating mechanism. As compared to models that only take into account sentence-level information (Heilman and Smith, 2010; Du et al., 2017; Zhou et al., 2017), we find that the linguistic knowledge introduced by the coreference representation aids question generation significantly, producing models that outperform the current state-of-the-art. We apply our system (composed of an answer span extraction system and the passage-level QG system) to the 10,000 top ranking Wikipedia articles and create a corpus of over one million question-answer pairs. We provide qualitative analysis for the this large-scale generated corpus from Wikipedia.

pdf bib
A Corpus of eRulemaking User Comments for Measuring Evaluability of Arguments
Joonsuk Park | Claire Cardie
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification
Xilun Chen | Yu Sun | Ben Athiwaratkun | Claire Cardie | Kilian Weinberger
Transactions of the Association for Computational Linguistics, Volume 6

In recent years great success has been achieved in sentiment classification for English, thanks in part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance of labeled data. To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN1) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exist. ADAN has two discriminative branches: a sentiment classifier and an adversarial language discriminator. Both branches take input from a shared feature extractor to learn hidden representations that are simultaneously indicative for the classification task and invariant across languages. Experiments on Chinese and Arabic sentiment classification demonstrate that ADAN significantly outperforms state-of-the-art systems.

pdf bib
Understanding the Effect of Gender and Stance in Opinion Expression in Debates on “Abortion”
Esin Durmus | Claire Cardie
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

In this paper, we focus on understanding linguistic differences across groups with different self-identified gender and stance in expressing opinions about ABORTION. We provide a new dataset consisting of users’ gender, stance on ABORTION as well as the debates in ABORTION drawn from debate.org. We use the gender and stance information to identify significant linguistic differences across individuals with different gender and stance. We show the importance of considering the stance information along with the gender since we observe significant linguistic differences across individuals with different stance even within the same gender group.

pdf bib
Unsupervised Multilingual Word Embeddings
Xilun Chen | Claire Cardie
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Multilingual Word Embeddings (MWEs) represent words from multiple languages in a single distributional vector space. Unsupervised MWE (UMWE) methods acquire multilingual embeddings without cross-lingual supervision, which is a significant advantage over traditional supervised approaches and opens many new possibilities for low-resource languages. Prior art for learning UMWEs, however, merely relies on a number of independently trained Unsupervised Bilingual Word Embeddings (UBWEs) to obtain multilingual embeddings. These methods fail to leverage the interdependencies that exist among many languages. To address this shortcoming, we propose a fully unsupervised framework for learning MWEs that directly exploits the relations between all language pairs. Our model substantially outperforms previous approaches in the experiments on multilingual word translation and cross-lingual word similarity. In addition, our model even beats supervised approaches trained with cross-lingual resources.

pdf bib
Towards Dynamic Computation Graphs via Sparse Latent Structure
Vlad Niculae | André F. T. Martins | Claire Cardie
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Deep NLP models benefit from underlying structures in the data—e.g., parse trees—typically extracted using off-the-shelf parsers. Recent attempts to jointly learn the latent structure encounter a tradeoff: either make factorization assumptions that limit expressiveness, or sacrifice end-to-end differentiability. Using the recently proposed SparseMAP inference, which retrieves a sparse distribution over latent structures, we propose a novel approach for end-to-end learning of latent structure predictors jointly with a downstream predictor. To the best of our knowledge, our method is the first to enable unrestricted dynamic computation graph construction from the global latent structure, while maintaining differentiability.

2017

pdf bib
Identifying Where to Focus in Reading Comprehension for Neural Question Generation
Xinya Du | Claire Cardie
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

A first step in the task of automatically generating questions for testing reading comprehension is to identify question-worthy sentences, i.e. sentences in a text passage that humans find it worthwhile to ask questions about. We propose a hierarchical neural sentence-level sequence tagging model for this task, which existing approaches to question generation have ignored. The approach is fully data-driven — with no sophisticated NLP pipelines or any hand-crafted rules/features — and compares favorably to a number of baselines when evaluated on the SQuAD data set. When incorporated into an existing neural question generation system, the resulting end-to-end system achieves state-of-the-art performance for paragraph-level question generation for reading comprehension.

pdf bib
Proceedings of the 4th Workshop on Argument Mining
Ivan Habernal | Iryna Gurevych | Kevin Ashley | Claire Cardie | Nancy Green | Diane Litman | Georgios Petasis | Chris Reed | Noam Slonim | Vern Walker
Proceedings of the 4th Workshop on Argument Mining

pdf bib
Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees
Arzoo Katiyar | Claire Cardie
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a novel attention-based recurrent neural network for joint extraction of entity mentions and relations. We show that attention along with long short term memory (LSTM) network can extract semantic relations between entity mentions without having access to dependency trees. Experiments on Automatic Content Extraction (ACE) corpora show that our model significantly outperforms feature-based joint model by Li and Ji (2014). We also compare our model with an end-to-end tree-based LSTM model (SPTree) by Miwa and Bansal (2016) and show that our model performs within 1% on entity mentions and 2% on relations. Our fine-grained analysis also shows that our model performs significantly better on Agent-Artifact relations, while SPTree performs better on Physical and Part-Whole relations.

pdf bib
Argument Mining with Structured SVMs and RNNs
Vlad Niculae | Joonsuk Park | Claire Cardie
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a novel factor graph model for argument mining, designed for settings in which the argumentative relations in a document do not necessarily form a tree structure. (This is the case in over 20% of the web comments dataset we release.) Our model jointly learns elementary unit type classification and argumentative relation prediction. Moreover, our model supports SVM and RNN parametrizations, can enforce structure constraints (e.g., transitivity), and can express dependencies between adjacent relations and propositions. Our approaches outperform unstructured baselines in both web comments and argumentative essay datasets.

pdf bib
Learning to Ask: Neural Question Generation for Reading Comprehension
Xinya Du | Junru Shao | Claire Cardie
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study automatic question generation for sentences from text passages in reading comprehension. We introduce an attention-based sequence learning model for the task and investigate the effect of encoding sentence- vs. paragraph-level information. In contrast to all previous work, our model does not rely on hand-crafted rules or a sophisticated NLP pipeline; it is instead trainable end-to-end via sequence-to-sequence learning. Automatic evaluation results show that our system significantly outperforms the state-of-the-art rule-based system. In human evaluations, questions generated by our system are also rated as being more natural (i.e.,, grammaticality, fluency) and as more difficult to answer (in terms of syntactic and lexical divergence from the original text and reasoning needed to answer).

2016

pdf bib
Investigating LSTMs for Joint Extraction of Opinion Entities and Relations
Arzoo Katiyar | Claire Cardie
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability
Eneko Agirre | Carmen Banea | Claire Cardie | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre | Weiwei Guo | Iñigo Lopez-Gazpio | Montse Maritxalar | Rada Mihalcea | German Rigau | Larraitz Uria | Janyce Wiebe
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Socially-Informed Timeline Generation for Complex Events
Lu Wang | Claire Cardie | Galen Marchetti
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the 2nd Workshop on Argumentation Mining
Claire Cardie
Proceedings of the 2nd Workshop on Argumentation Mining

pdf bib
A Hierarchical Distance-dependent Bayesian Model for Event Coreference Resolution
Bishan Yang | Claire Cardie | Peter Frazier
Transactions of the Association for Computational Linguistics, Volume 3

We present a novel hierarchical distance-dependent Bayesian model for event coreference resolution. While existing generative models for event coreference resolution are completely unsupervised, our model allows for the incorporation of pairwise distances between event mentions — information that is widely used in supervised coreference models to guide the generative clustering processing for better event clustering both within and across documents. We model the distances between event mentions using a feature-rich learnable distance function and encode them as Bayesian priors for nonparametric clustering. Experiments on the ECB+ corpus show that our model outperforms state-of-the-art methods for both within- and cross-document event coreference resolution.

2014

pdf bib
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
Eneko Agirre | Carmen Banea | Claire Cardie | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre | Weiwei Guo | Rada Mihalcea | German Rigau | Janyce Wiebe
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
SimCompass: Using Deep Learning Word Embeddings to Assess Cross-level Similarity
Carmen Banea | Di Chen | Rada Mihalcea | Claire Cardie | Janyce Wiebe
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Identifying Appropriate Support for Propositions in Online User Comments
Joonsuk Park | Claire Cardie
Proceedings of the First Workshop on Argumentation Mining

pdf bib
Overview of the 2014 NLP Unshared Task in PoliInformatics
Noah A. Smith | Claire Cardie | Anne Washington | John Wilkerson
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

pdf bib
Improving Agreement and Disagreement Identification in Online Discussions with A Socially-Tuned Sentiment Lexicon
Lu Wang | Claire Cardie
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
The Enrollment Effect: A Study of Amazon’s Vine Program
Dinesh Puranam | Claire Cardie
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

pdf bib
Joint Modeling of Opinion Expression Extraction and Attribute Classification
Bishan Yang | Claire Cardie
Transactions of the Association for Computational Linguistics, Volume 2

In this paper, we study the problems of opinion expression extraction and expression-level polarity and intensity classification. Traditional fine-grained opinion analysis systems address these problems in isolation and thus cannot capture interactions among the textual spans of opinion expressions and their opinion-related properties. We present two types of joint approaches that can account for such interactions during 1) both learning and inference or 2) only during inference. Extensive experiments on a standard dataset demonstrate that our approaches provide substantial improvements over previously published results. By analyzing the results, we gain some insight into the advantages of different joint models.

pdf bib
Opinion Mining with Deep Recurrent Neural Networks
Ozan İrsoy | Claire Cardie
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Major Life Event Extraction from Twitter based on Congratulations/Condolences Speech Acts
Jiwei Li | Alan Ritter | Claire Cardie | Eduard Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
Bishan Yang | Claire Cardie
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Towards a General Rule for Identifying Deceptive Opinion Spam
Jiwei Li | Myle Ott | Claire Cardie | Eduard Hovy
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Piece of My Mind: A Sentiment Analysis Approach for Online Dispute Detection
Lu Wang | Claire Cardie
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Query-Focused Opinion Summarization for User-Generated Content
Lu Wang | Hema Raghavan | Claire Cardie | Vittorio Castelli
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Book Reviews: Sentiment Analysis and Opinion Mining by Bing Liu
Claire Cardie
Computational Linguistics, Volume 40, Issue 2 - June 2014

2013

pdf bib
Identifying Manipulated Offerings on Review Portals
Jiwei Li | Myle Ott | Claire Cardie
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
Lu Wang | Hema Raghavan | Vittorio Castelli | Radu Florian | Claire Cardie
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Domain-Independent Abstract Generation for Focused Meeting Summarization
Lu Wang | Claire Cardie
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Joint Inference for Fine-grained Opinion Extraction
Bishan Yang | Claire Cardie
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
TopicSpam: a Topic-Model based approach for spam detection
Jiwei Li | Claire Cardie | Sujian Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Negative Deceptive Opinion Spam
Myle Ott | Claire Cardie | Jeffrey T. Hancock
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
CPN-CORE: A Text Semantic Similarity System Infused with Opinion Knowledge
Carmen Banea | Yoonjung Choi | Lingjia Deng | Samer Hassan | Michael Mohler | Bishan Yang | Claire Cardie | Rada Mihalcea | Jan Wiebe
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

2012

pdf bib
In Search of a Gold Standard in Studies of Deception
Stephanie Gokhman | Jeff Hancock | Poornima Prabhu | Myle Ott | Claire Cardie
Proceedings of the Workshop on Computational Approaches to Deception Detection

pdf bib
Unsupervised Topic Modeling Approaches to Decision Summarization in Spoken Meetings
Lu Wang | Claire Cardie
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Improving Implicit Discourse Relation Recognition Through Feature Set Optimization
Joonsuk Park | Claire Cardie
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Focused Meeting Summarization via Unsupervised Relation Extraction
Lu Wang | Claire Cardie
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Extracting Opinion Expressions with semi-Markov Conditional Random Fields
Bishan Yang | Claire Cardie
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
Myle Ott | Yejin Choi | Claire Cardie | Jeffrey T. Hancock
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
Bin Lu | Chenhao Tan | Claire Cardie | Benjamin K. Tsou
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Summarizing Decisions in Spoken Meetings
Lu Wang | Claire Cardie
Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages

pdf bib
Reconciling OntoNotes: Unrestricted Coreference Resolution in OntoNotes with Reconcile.
Veselin Stoyanov | Uday Babbar | Pracheer Gupta | Claire Cardie
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Compositional Matrix-Space Models for Sentiment Analysis
Ainur Yessenalina | Claire Cardie
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Automatically Creating General-Purpose Opinion Summaries from Text
Veselin Stoyanov | Claire Cardie
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
Coreference Resolution with Reconcile
Veselin Stoyanov | Claire Cardie | Nathan Gilbert | Ellen Riloff | David Buttler | David Hysom
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Hierarchical Sequential Learning for Extracting Opinions and Their Attributes
Yejin Choi | Claire Cardie
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Automatically Generating Annotator Rationales to Improve Sentiment Classification
Ainur Yessenalina | Yejin Choi | Claire Cardie
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Multi-Level Structured Models for Document-Level Sentiment Classification
Ainur Yessenalina | Yisong Yue | Claire Cardie
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Adapting a Polarity Lexicon using Integer Linear Programming for Domain-Specific Sentiment Classification
Yejin Choi | Claire Cardie
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Conundrums in Noun Phrase Coreference Resolution: Making Sense of the State-of-the-Art
Veselin Stoyanov | Nathan Gilbert | Claire Cardie | Ellen Riloff
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Annotating Topics of Opinions
Veselin Stoyanov | Claire Cardie
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Fine-grained subjectivity analysis has been the subject of much recent research attention. As a result, the field has gained a number of working definitions, technical approaches and manually annotated corpora that cover many facets of subjectivity. Little work has been done, however, on one aspect of fine-grained opinions - the specification and identification of opinion topics. In particular, due to the difficulty of manual opinion topic annotation, no general-purpose opinion corpus with information about topics of fine-grained opinions currently exists. In this paper, we propose a methodology for the manual annotation of opinion topics and use it to annotate a portion of an existing general-purpose opinion corpus with opinion topic information. Inter-annotator agreement results according to a number of metrics suggest that the annotations are reliable.

pdf bib
An eRulemaking Corpus: Identifying Substantive Issues in Public Comments
Claire Cardie | Cynthia Farina | Matt Rawding | Adil Aijaz
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe the creation of a corpus that supports a real-world hierarchical text categorization task in the domain of electronic rulemaking (eRulemaking). Features of the task and of the eRulemaking domain engender both a non-traditional text categorization corpus and a correspondingly difficult machine learning task. Interannotator agreement results are presented for a group of six annotators. We also briefly describe the results of experiments that apply standard and hierarchical text categorization techniques to the eRulemaking data sets. The corpus is the first in a series of related sentence-level text categorization corpora to be developed in the eRulemaking domain.

pdf bib
Topic Identification for Fine-Grained Opinion Analysis
Veselin Stoyanov | Claire Cardie
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
The Power of Negative Thinking: Exploiting Label Disagreement in the Min-cut Classification Framework
Mohit Bansal | Claire Cardie | Lillian Lee
Coling 2008: Companion volume: Posters

pdf bib
Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis
Yejin Choi | Claire Cardie
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Structured Local Training and Biased Potential Functions for Conditional Random Fields with Application to Coreference Resolution
Yejin Choi | Claire Cardie
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

2006

pdf bib
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
Nicoletta Calzolari | Claire Cardie | Pierre Isabelle
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Toward Opinion Summarization: Linking the Sources
Veselin Stoyanov | Claire Cardie
Proceedings of the Workshop on Sentiment and Subjectivity in Text

pdf bib
Partially Supervised Coreference Resolution for Opinion Summarization through Structured Rule Learning
Veselin Stoyanov | Claire Cardie
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Joint Extraction of Entities and Relations for Opinion Recognition
Yejin Choi | Eric Breck | Claire Cardie
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib
Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns
Yejin Choi | Claire Cardie | Ellen Riloff | Siddharth Patwardhan
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Optimizing to Arbitrary NLP Metrics using Ensemble Selection
Art Munson | Claire Cardie | Rich Caruana
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Multi-Perspective Question Answering Using the OpQA Corpus
Veselin Stoyanov | Claire Cardie | Janyce Wiebe
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
OpinionFinder: A System for Subjectivity Analysis
Theresa Wilson | Paul Hoffmann | Swapna Somasundaran | Jason Kessler | Janyce Wiebe | Yejin Choi | Claire Cardie | Ellen Riloff | Siddharth Patwardhan
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

2004

pdf bib
Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions
Eric Breck | Claire Cardie
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Weakly Supervised Natural Language Learning Without Redundant Views
Vincent Ng | Claire Cardie
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Bootstrapping Coreference Classifiers with Multiple Machine Learning Algorithms
Vincent Ng | Claire Cardie
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing

2002

pdf bib
Selecting sentences for multidocument summaries using randomized local search
Michael White | Claire Cardie
Proceedings of the ACL-02 Workshop on Automatic Summarization

pdf bib
Combining Sample Selection and Error-Driven Pruning for Machine Learning of Coreference Rules
Vincent Ng | Claire Cardie
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
Identifying Anaphoric and Non-Anaphoric Noun Phrases to Improve Coreference Resolution
Vincent Ng | Claire Cardie
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Improving Machine Learning Approaches to Coreference Resolution
Vincent Ng | Claire Cardie
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf bib
Limitations of Co-Training for Natural Language Learning from Large Datasets
David Pierce | Claire Cardie
Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multidocument Summarization via Information Extraction
Michael White | Tanya Korelsky | Claire Cardie | Vincent Ng | David Pierce | Kiri Wagstaff
Proceedings of the First International Conference on Human Language Technology Research

2000

pdf bib
Towards Translingual Information Access using Portable Information Extraction
Michael White | Claire Cardie | Chung-hye Han | Nari Kim | Benoit Lavoie | Martha Palmer | Owen Rainbow | Juntae Yoon
ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems

pdf bib
Examining the Role of Statistical and Linguistic Knowledge Sources in a General-Knowledge Question-Answering System
Claire Cardie | Vincent Ng | David Pierce | Chris Buckley
Sixth Applied Natural Language Processing Conference

1999

pdf bib
Noun Phrase Coreference as Clustering
Claire Cardie | Kiri Wagstaff
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1998

pdf bib
The Smart/Empire TIPSTER IR System
Chris Buckley | Janet Walz | Claire Cardie | Scott Mardis | Mandar Mitra | David Pierce | Kiri Wagstaff
TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998

pdf bib
Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification
Claire Cardie | David Pierce
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification
Claire Cardie | David Pierce
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

1996

pdf bib
Automating Feature Set Selection for Case-Based Learning of Linguistic Knowledge
Claire Cardie
Conference on Empirical Methods in Natural Language Processing

1993

pdf bib
UMass/Hughes: Description of the CIRCUS System Used for TIPSTER Text
W. Lehnert | J. McCarthy | S. Soderland | E. Riloff | C. Cardie | J. Peterson | F. Feng
TIPSTER TEXT PROGRAM: PHASE I: Proceedings of a Workshop held at Fredricksburg, Virginia, September 19-23, 1993

pdf bib
UMass/Hughes: Description of the CIRCUS System Used for MUC-51
W. Lehnert | J. McCarthy | S. Soderland | E. Riloff | C. Cardie | J. Peterson | F. Feng
Fifth Message Understanding Conference (MUC-5): Proceedings of a Conference Held in Baltimore, Maryland, August 25-27, 1993

1992

pdf bib
Corpus-Based Acquisition of Relative Pronoun Disambiguation Heuristics
Claire Cardie
30th Annual Meeting of the Association for Computational Linguistics

pdf bib
University of Massachusetts: MUC-4 Test Results and Analysis
W. Lehnert | C. Cardie | D. Fisher | J. McCarthy | E. Riloff | S. Soderland
Fourth Message Understanding Conference (MUC-4): Proceedings of a Conference Held in McLean, Virginia, June 16-18, 1992

pdf bib
University of Massachusetts: Description of the CIRCUS System as Used for MUC-4
W. Lehnert | C. Cardie | D. Fisher | J. McCarthy | E. Riloff | S. Soderland
Fourth Message Understanding Conference (MUC-4): Proceedings of a Conference Held in McLean, Virginia, June 16-18, 1992

1991

pdf bib
University of Massachusetts: MUC-3 Test Results and Analysis
Wendy Lehnert | Claire Cardie | David Fisher | Ellen Riloff | Robert Williams
Third Message Understanding Conference (MUC-3): Proceedings of a Conference Held in San Diego, California, May 21-23, 1991

pdf bib
University of Massachusetts: Description of the CIRCUS System as Used for MUC-3
Wendy Lehnert | Claire Cardie | David Fisher | Ellen Riloff | Robert Williams
Third Message Understanding Conference (MUC-3): Proceedings of a Conference Held in San Diego, California, May 21-23, 1991

Search
Co-authors