Sanda Harabagiu

Also published as: Sanda M. Harabagiu

2025

pdf bib abs
Automatically Discovering How Misogyny is Framed on Social Media
Rakshitha Rao Ailneni | Sanda M. Harabagiu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Misogyny, which is widespread on social media, can be identified not only by recognizing its many forms but also by discovering how misogyny is framed. This paper considers the automatic discovery of misogyny problems and their frames through the Dis-MP&F method, which enables the generation of a data-driven, rich Taxonomy of Misogyny (ToM), offering new insights in the complexity of expressions of misogyny. Furthermore, the Dis-MP&F method, informed by the ToM, is capable of producing very promising results on a misogyny benchmark dataset.

2024

pdf bib abs
Tree-of-Counterfactual Prompting for Zero-Shot Stance Detection
Maxwell Weinzierl | Sanda Harabagiu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Stance detection enables the inference of attitudes from human communications. Automatic stance identification was mostly cast as a classification problem. However, stance decisions involve complex judgments, which can be nowadays generated by prompting Large Language Models (LLMs). In this paper we present a new method for stance identification which (1) relies on a new prompting framework, called Tree-of-Counterfactual prompting; (2) operates not only on textual communications, but also on images; (3) allows more than one stance object type; and (4) requires no examples of stance attribution, thus it is a “Tabula Rasa” Zero-Shot Stance Detection (TR-ZSSD) method. Our experiments indicate surprisingly promising results, outperforming fine-tuned stance detection systems.

pdf bib abs
Discovering and Articulating Frames of Communication from Social Media Using Chain-of-Thought Reasoning
Maxwell Weinzierl | Sanda Harabagiu
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Frames of Communication (FoCs) are ubiquitous in social media discourse. They define what counts as a problem, diagnose what is causing the problem, elicit moral judgments and imply remedies for resolving the problem. Most research on automatic frame detection involved the recognition of the problems addressed by frames, but did not consider the articulation of frames. Articulating an FoC involves reasoning with salient problems, their cause and eventual solution. In this paper we present a method for Discovering and Articulating FoCs (DA-FoC) that relies on a combination of Chain-of-Thought prompting of large language models (LLMs) with In-Context Active Curriculum Learning. Very promising evaluation results indicate that 86.72% of the FoCs encoded by communication experts on the same reference dataset were also uncovered by DA-FoC. Moreover, DA-FoC uncovered many new FoCs, which escaped the experts. Interestingly, 55.1% of the known FoCs were judged as being better articulated than the human-written ones, while 93.8% of the new FoCs were judged as having sound rationale and being clearly articulated.

pdf bib abs
The Impact of Stance Object Type on the Quality of Stance Detection
Maxwell A. Weinzierl | Sanda M. Harabagiu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Stance as an expression of an author’s standpoint and as a means of communication has long been studied by computational linguists. Automatically identifying the stance of a subject toward an object is an active area of research in natural language processing. Significant work has employed topics and claims as the object of stance, with frames of communication becoming more recently considered as alternative objects of stance. However, little attention has been paid to finding what are the benefits and what are the drawbacks when inferring the stance of a text towards different possible stance objects. In this paper we seek to answer this question by analyzing the implied knowledge and the judgments required when deciding the stance of a text towards each stance object type. Our analysis informed experiments with models capable of inferring the stance of a text towards any of the stance object types considered, namely topics, claims, and frames of communication. Experiments clearly indicate that it is best to infer the stance of a text towards a frame of communication, rather than a claim or a topic. It is also better to infer the stance of a text towards a claim rather than a topic. Therefore we advocate that rather than continuing efforts to annotate the stance of texts towards topics, it is better to use those efforts to produce annotations towards frames of communication. These efforts will allow us to better capture the stance towards claims and topics as well.

2023

pdf bib abs
Identification of Multimodal Stance Towards Frames of Communication
Maxwell Weinzierl | Sanda Harabagiu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Frames of communication are often evoked in multimedia documents. When an author decides to add an image to a text, one or both of the modalities may evoke a communication frame. Moreover, when evoking the frame, the author also conveys her/his stance towards the frame. Until now, determining if the author is in favor of, against or has no stance towards the frame was performed automatically only when processing texts. This is due to the absence of stance annotations on multimedia documents. In this paper we introduce MMVax-Stance, a dataset of 11,300 multimedia documents retrieved from social media, which have stance annotations towards 113 different frames of communication. This dataset allowed us to experiment with several models of multimedia stance detection, which revealed important interactions between texts and images in the inference of stance towards communication frames. When inferring the text/image relations, a set of 46,606 synthetic examples of multimodal documents with known stance was generated. This greatly impacted the quality of identifying multimedia stance, yielding an improvement of 20% in F1-score.

2022

pdf bib abs
VaccineLies: A Natural Language Resource for Learning to Recognize Misinformation about the COVID-19 and HPV Vaccines
Maxwell Weinzierl | Sanda Harabagiu
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Billions of COVID-19 vaccines have been administered, but many remain hesitant. Misinformation about the COVID-19 vaccines and other vaccines, propagating on social media, is believed to drive hesitancy towards vaccination. The ability to automatically recognize misinformation targeting vaccines on Twitter depends on the availability of data resources. In this paper we present VaccineLies, a large collection of tweets propagating misinformation about two vaccines: the COVID-19 vaccines and the Human Papillomavirus (HPV) vaccines. Misinformation targets are organized in vaccine-specific taxonomies, which reveal the misinformation themes and concerns. The ontological commitments of the misinformation taxonomies provide an understanding of which misinformation themes and concerns dominate the discourse about the two vaccines covered in VaccineLies. The organization into training, testing and development sets of VaccineLies invites the development of novel supervised methods for detecting misinformation on Twitter and identifying the stance towards it. Furthermore, VaccineLies can be a stepping stone for the development of datasets focusing on misinformation targeting additional vaccines.

2020

pdf bib abs
The Language of Brain Signals: Natural Language Processing of Electroencephalography Reports
Ramon Maldonado | Sanda Harabagiu
Proceedings of the Twelfth Language Resources and Evaluation Conference

Brain signals are captured by clinical electroencephalography (EEG) which is an excellent tool for probing neural function. When EEG tests are performed, a textual EEG report is generated by the neurologist to document the findings, thus using language that describes the brain signals and its clinical correlations. Even with the impetus provided by the BRAIN initiative (brainitititive.nih.gov), there are no annotations available in texts that capture language describing the brain activities and their correlations with various pathologies. In this paper we describe an annotation effort carried out on a large corpus of EEG reports, providing examples of EEG-specific and clinically relevant concepts. In addition, we detail our annotation schema for brain signal attributes. We also discuss the resulting annotation of long-distance relations between concepts in EEG reports. By exemplifying a self-attention joint-learning to predict similar annotations in the EEG report corpus, we discuss the promising results, hoping that our effort will inform the design of novel knowledge capture techniques that will include the language of brain signals.

pdf bib abs
HLTRI at W-NUT 2020 Shared Task-3: COVID-19 Event Extraction from Twitter Using Multi-Task Hopfield Pooling
Maxwell Weinzierl | Sanda Harabagiu
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Extracting structured knowledge involving self-reported events related to the COVID-19 pandemic from Twitter has the potential to inform surveillance systems that play a critical role in public health. The event extraction challenge presented by the W-NUT 2020 Shared Task 3 focused on the identification of five types of events relevant to the COVID-19 pandemic and their respective set of pre-defined slots encoding demographic, epidemiological, clinical as well as spatial, temporal or subjective knowledge. Our participation in the challenge led to the design of a neural architecture for jointly identifying all Event Slots expressed in a tweet relevant to an event of interest. This architecture uses COVID-Twitter-BERT as the pre-trained language model. In addition, to learn text span embeddings for each Event Slot, we relied on a special case of Hopfield Networks, namely Hopfield pooling. The results of the shared task evaluation indicate that our system performs best when it is trained on a larger dataset, while it remains competitive when training on smaller datasets.

2016

pdf bib abs
Embedding Open-domain Common-sense Knowledge from Text
Travis Goodwin | Sanda Harabagiu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Our ability to understand language often relies on common-sense knowledge ― background information the speaker can assume is known by the reader. Similarly, our comprehension of the language used in complex domains relies on access to domain-specific knowledge. Capturing common-sense and domain-specific knowledge can be achieved by taking advantage of recent advances in open information extraction (IE) techniques and, more importantly, of knowledge embeddings, which are multi-dimensional representations of concepts and relations. Building a knowledge graph for representing common-sense knowledge in which concepts discerned from noun phrases are cast as vertices and lexicalized relations are cast as edges leads to learning the embeddings of common-sense knowledge accounting for semantic compositionality as well as implied knowledge. Common-sense knowledge is acquired from a vast collection of blogs and books as well as from WordNet. Similarly, medical knowledge is learned from two large sets of electronic health records. The evaluation results of these two forms of knowledge are promising: the same knowledge acquisition methodology based on learning knowledge embeddings works well both for common-sense knowledge and for medical knowledge Interestingly, the common-sense knowledge that we have acquired was evaluated as being less neutral than than the medical knowledge, as it often reflected the opinion of the knowledge utterer. In addition, the acquired medical knowledge was evaluated as more plausible than the common-sense knowledge, reflecting the complexity of acquiring common-sense knowledge due to the pragmatics and economicity of language.

2014

pdf bib
Unsupervised Event Coreference Resolution
Cosmin Adrian Bejan | Sanda Harabagiu
Computational Linguistics, Volume 40, Issue 2 - June 2014

pdf bib abs
Clinical Data-Driven Probabilistic Graph Processing
Travis Goodwin | Sanda Harabagiu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Electronic Medical Records (EMRs) encode an extraordinary amount of medical knowledge. Collecting and interpreting this knowledge, however, belies a significant level of clinical understanding. Automatically capturing the clinical information is crucial for performing comparative effectiveness research. In this paper, we present a data-driven approach to model semantic dependencies between medical concepts, qualified by the beliefs of physicians. The dependencies, captured in a patient cohort graph of clinical pictures and therapies is further refined into a probabilistic graphical model which enables efficient inference of patient-centered treatment or test recommendations (based on probabilities). To perform inference on the graphical model, we describe a technique of smoothing the conditional likelihood of medical concepts by their semantically-similar belief values. The experimental results, as compared against clinical guidelines are very promising.

pdf bib
Structuring Operative Notes using Active Learning
Kirk Roberts | Sanda Harabagiu | Michael Skinner
Proceedings of BioNLP 2014

2013

pdf bib
The Impact of Selectional Preference Agreement on Semantic Relational Similarity
Bryan Rink | Sanda Harabagiu
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

pdf bib
Recognizing Spatial Containment Relations between Event Mentions
Kirk Roberts | Michael A. Skinner | Sanda M. Harabagiu
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

2012

pdf bib abs
EmpaTweet: Annotating and Detecting Emotions on Twitter
Kirk Roberts | Michael A. Roach | Joseph Johnson | Josh Guthrie | Sanda M. Harabagiu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The rise of micro-blogging in recent years has resulted in significant access to emotion-laden text. Unlike emotion expressed in other textual sources (e.g., blogs, quotes in newswire, email, product reviews, or even clinical text), micro-blogs differ by (1) placing a strict limit on length, resulting radically in new forms of emotional expression, and (2) encouraging users to express their daily thoughts in real-time, often resulting in far more emotion statements than might normally occur. In this paper, we introduce a corpus collected from Twitter with annotated micro-blog posts (or tweets) annotated at the tweet-level with seven emotions: ANGER, DISGUST, FEAR, JOY, LOVE, SADNESS, and SURPRISE. We analyze how emotions are distributed in the data we annotated and compare it to the distributions in other emotion-annotated corpora. We also used the annotated corpus to train a classifier that automatically discovers the emotions in tweets. In addition, we present an analysis of the linguistic style used for expressing emotions our corpus. We hope that these observations will lead to the design of novel emotion detection techniques that account for linguistic style and psycholinguistic theories.

pdf bib abs
Annotating Spatial Containment Relations Between Events
Kirk Roberts | Travis Goodwin | Sanda M. Harabagiu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

A significant amount of spatial information in textual documents is hidden within the relationship between events. While humans have an intuitive understanding of these relationships that allow us to recover an object's or event's location, currently no annotated data exists to allow automatic discovery of spatial containment relations between events. We present our process for building such a corpus of manually annotated spatial relations between events. Events form complex predicate-argument structures that model the participants in the event, their roles, as well as the temporal and spatial grounding. In addition, events are not presented in isolation in text; there are explicit and implicit interactions between events that often participate in event structures. In this paper, we focus on five spatial containment relations that may exist between events: (1) SAME, (2) CONTAINS, (3) OVERLAPS, (4) NEAR, and (5) DIFFERENT. Using the transitive closure across these spatial relations, the implicit location of many events and their participants can be discovered. We discuss our annotation schema for spatial containment relations, placing it within the pre-existing theories of spatial representation. We also discuss our annotation guidelines for maintaining annotation quality as well as our process for augmenting SpatialML with spatial containment relations between events. Additionally, we outline some baseline experiments to evaluate the feasibility of developing supervised systems based on this corpus. These results indicate that although the task is challenging, automated methods are capable of discovering spatial containment relations between events.

pdf bib
UTD: Determining Relational Similarity Using Lexical Patterns
Bryan Rink | Sanda Harabagiu
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
UTD-SpRL: A Joint Approach to Spatial Role Labeling
Kirk Roberts | Sanda Harabagiu
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
UTDHLT: COPACETIC System for Choosing Plausible Alternatives
Travis Goodwin | Bryan Rink | Kirk Roberts | Sanda Harabagiu
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
A generative model for unsupervised discovery of relations and argument classes from clinical texts
Bryan Rink | Sanda Harabagiu
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Unsupervised Learning of Selectional Restrictions and Detection of Argument Coercions
Kirk Roberts | Sanda Harabagiu
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Unsupervised Discovery of Collective Action Frames for Socio-Cultural Analysis
Andrew Hickl | Sanda Harabagiu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

pdf bib abs
A Linguistic Resource for Semantic Parsing of Motion Events
Kirk Roberts | Srikanth Gullapalli | Cosmin Adrian Bejan | Sanda Harabagiu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents a corpus of annotated motion events and their event structure. We consider motion events triggered by a set of motion evoking words and contemplate both literal and figurative interpretations of them. Figurative motion events are extracted into the same event structure but are marked as figurative in the corpus. To represent the event structure of motion, we use the FrameNet annotation standard, which encodes motion in over 70 frames. In order to acquire a diverse set of texts that are different from FrameNet's, we crawled blog and news feeds for five different domains: sports, newswire, finance, military, and gossip. We then annotated these documents with an automatic FrameNet parser. Its output was manually corrected to account for missing and incorrect frames as well as missing and incorrect frame elements. The corpus, UTD-MotionEvent, may act as a resource for semantic parsing, detection of figurative language, spatial reasoning, and other tasks.

pdf bib
Unsupervised Event Coreference Resolution with Rich Linguistic Features
Cosmin Bejan | Sanda Harabagiu
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
UTDMet: Combining WordNet and Corpus Data for Argument Coercion Detection
Kirk Roberts | Sanda Harabagiu
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
UTD: Classifying Semantic Relations by Combining Lexical and Semantic Resources
Bryan Rink | Sanda Harabagiu
Proceedings of the 5th International Workshop on Semantic Evaluation

2008

pdf bib abs
A Linguistic Resource for Discovering Event Structures and Resolving Event Coreference
Cosmin Bejan | Sanda Harabagiu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we present a linguistic resource that annotates event structures in texts. We consider an event structure as a collection of events that interact with each other in a given situation. We interpret the interactions between events as event relations. In this regard, we propose and annotate a set of six relations that best capture the concept of event structure. These relations are: subevent, reason, purpose, enablement, precedence and related. A document from this resource can encode multiple event structures and an event structure can be described across multiple documents. In order to unify event structures, we also annotate inter- and intra-document event coreference. Moreover, we provide methodologies for automatic discovery of event structures from texts. First, we group the events that constitute an event structure into event clusters and then, we use supervised learning frameworks to classify the relations that exist between events from the same cluster

2007

pdf bib
UTD-HLT-CG: Semantic Architecture for Metonymy Resolution and Classification of Nominal Relations
Cristina Nicolae | Gabriel Nicolae | Sanda Harabagiu
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
Textual Entailment Through Extended Lexical Overlap and Lexico-Semantic Matching
Rod Adams | Gabriel Nicolae | Cristina Nicolae | Sanda Harabagiu
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

pdf bib abs
Impact of Question Decomposition on the Quality of Answer Summaries
Finley Lacatusu | Andrew Hickl | Sanda Harabagiu
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Generating answers to complex questions in the form of multi-document summaries requires access to question decomposition methods. In this paper we present three methods for decomposing complex questions and we evaluate their impact on the responsiveness of the answers they enable.

pdf bib abs
An Answer Bank for Temporal Inference
Sanda Harabagiu | Cosmin Adrian Bejan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Answering questions that ask about temporal information involves several forms of inference. In order to develop question answering capabilities that benefit from temporal inference, we believe that a large corpus of questions and answers that are discovered based on temporal information should be available. This paper describes our methodology for creating AnswerTime-Bank, a large corpus of questions and answers on which Question Answering systems can operate using complex temporal inference.

pdf bib
Methods for Using Textual Entailment in Open-Domain Question Answering
Sanda Harabagiu | Andrew Hickl
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
FERRET: Interactive Question-Answering for Real-World Environments
Andrew Hickl | Patrick Wang | John Lehmann | Sanda Harabagiu
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

pdf bib
Using Scenario Knowledge in Automatic Question Answering
Sanda Harabagiu | Andrew Hickl
Proceedings of the Workshop on Task-Focused Summarization and Question Answering

pdf bib
Enhanced Interactive Question-Answering with Conditional Random Fields
Andrew Hickl | Sanda Harabagiu
Proceedings of the Interactive Question Answering Workshop at HLT-NAACL 2006

2005

pdf bib
Experiments with Interactive Question-Answering
Sanda Harabagiu | Andrew Hickl | John Lehmann | Dan Moldovan
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
Incremental Topic Representations
Sanda Harabagiu
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Question Answering Based on Semantic Structures
Srini Narayanan | Sanda Harabagiu
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib abs
Multi-Document Summarization Using Multiple-Sequence Alignment
V. Finley Lacatusu | Steven J. Maiorano | Sanda M. Harabagiu
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

This paper describes a novel clustering-based text summarization system that uses Multiple Sequence Alignment to improve the alignment of sentences within topic clusters. While most current clustering-based summarization systems base their summaries only on the common information contained in a collection of highly-related sentences, our system constructs more informative summaries that incorporate both the redundant and unique contributions of the sentences in the cluster. When evaluated using ROUGE, the summaries produced by our system represent a substantial improvement over the baseline, which is at 63% of the human performance.

pdf bib
NameNet: a Self-Improving Resource for Name Classification
Paul Morarescu | Sanda Harabagiu
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Semantic parsing based on FrameNet
Cosmin Adrian Bejan | Alessandro Moschitti | Paul Morărescu | Gabriel Nicolae | Sanda Harabagiu
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Strategies for Advanced Question Answering
Sanda Harabagiu | Finley Lacatusu
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004

pdf bib
Answering Questions Using Advanced Semantics and Probabilistic Inference
Srini Narayanan | Sanda Harabagiu
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004

pdf bib
Intentions, Implicatures and Processing of Complex Questions
Sanda Harabagiu | Steven Maiorano | Alessandro Moschitti | Cosmin Bejan
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004

pdf bib
A Novel Approach to Focus Identification in Question/Answering Systems
Alessandro Moschitti | Sanda Harabagiu
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004

pdf bib
Experiments with Interactive Question Answering in Complex Scenarios
Andrew Hickl | John Lehmann | John Williams | Sanda Harabagiu
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004