Rada Mihalcea

Also published as: Rada F. Mihalcea


2021

pdf bib
CIDER: Commonsense Inference for Dialogue Explanation and Reasoning
Deepanway Ghosal | Pengfei Hong | Siqi Shen | Navonil Majumder | Rada Mihalcea | Soujanya Poria
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Commonsense inference to understand and explain human language is a fundamental research problem in natural language processing. Explaining human conversations poses a great challenge as it requires contextual understanding, planning, inference, and several aspects of reasoning including causal, temporal, and commonsense reasoning. In this work, we introduce CIDER – a manually curated dataset that contains dyadic dialogue explanations in the form of implicit and explicit knowledge triplets inferred using contextual commonsense inference. Extracting such rich explanations from conversations can be conducive to improving several downstream applications. The annotated triplets are categorized by the type of commonsense knowledge present (e.g., causal, conditional, temporal). We set up three different tasks conditioned on the annotated dataset: Dialogue-level Natural Language Inference, Span Extraction, and Multi-choice Span Selection. Baseline results obtained with transformer-based models reveal that the tasks are difficult, paving the way for promising future research. The dataset and the baseline implementations are publicly available at https://github.com/declare-lab/CIDER.

pdf bib
Extractive and Abstractive Explanations for Fact-Checking and Evaluation of News
Ashkan Kazemi | Zehua Li | Verónica Pérez-Rosas | Rada Mihalcea
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

In this paper, we explore the construction of natural language explanations for news claims, with the goal of assisting fact-checking and news evaluation applications. We experiment with two methods: (1) an extractive method based on Biased TextRank – a resource-effective unsupervised graph-based algorithm for content extraction; and (2) an abstractive method based on the GPT-2 language model. We perform comparative evaluations on two misinformation datasets in the political and health news domains, and find that the extractive method shows the most promise.

pdf bib
Evaluating Automatic Speech Recognition Quality and Its Impact on Counselor Utterance Coding
Do June Min | Verónica Pérez-Rosas | Rada Mihalcea
Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access

Automatic speech recognition (ASR) is a crucial step in many natural language processing (NLP) applications, as often available data consists mainly of raw speech. Since the result of the ASR step is considered as a meaningful, informative input to later steps in the NLP pipeline, it is important to understand the behavior and failure mode of this step. In this work, we analyze the quality of ASR in the psychotherapy domain, using motivational interviewing conversations between therapists and clients. We conduct domain agnostic and domain-relevant evaluations using standard evaluation metrics and also identify domain-relevant keywords in the ASR output. Moreover, we empirically study the effect of mixing ASR and manual data during the training of a downstream NLP model, and also demonstrate how additional local context can help alleviate the error introduced by noisy ASR transcripts.

pdf bib
MUSER: MUltimodal Stress detection using Emotion Recognition as an Auxiliary Task
Yiqun Yao | Michalis Papakostas | Mihai Burzo | Mohamed Abouelenien | Rada Mihalcea
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

The capability to automatically detect human stress can benefit artificial intelligent agents involved in affective computing and human-computer interaction. Stress and emotion are both human affective states, and stress has proven to have important implications on the regulation and expression of emotion. Although a series of methods have been established for multimodal stress detection, limited steps have been taken to explore the underlying inter-dependence between stress and emotion. In this work, we investigate the value of emotion recognition as an auxiliary task to improve stress detection. We propose MUSER – a transformer-based model architecture and a novel multi-task learning algorithm with speed-based dynamic sampling strategy. Evaluation on the Multimodal Stressed Emotion (MuSE) dataset shows that our model is effective for stress detection with both internal and external auxiliary tasks, and achieves state-of-the-art results.

pdf bib
Room to Grow: Understanding Personal Characteristics Behind Self Improvement Using Social Media
MeiXing Dong | Xueming Xu | Yiwei Zhang | Ian Stewart | Rada Mihalcea
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media

Many people aim for change, but not everyone succeeds. While there are a number of social psychology theories that propose motivation-related characteristics of those who persist with change, few computational studies have explored the motivational stage of personal change. In this paper, we investigate a new dataset consisting of the writings of people who manifest intention to change, some of whom persist while others do not. Using a variety of linguistic analysis techniques, we first examine the writing patterns that distinguish the two groups of people. Persistent people tend to reference more topics related to long-term self-improvement and use a more complicated writing style. Drawing on these consistent differences, we build a classifier that can reliably identify the people more likely to persist, based on their language. Our experiments provide new insights into the motivation-related behavior of people who persist with their intention to change.

pdf bib
Exploring the Role of Context in Utterance-level Emotion, Act and Intent Classification in Conversations: An Empirical Study
Deepanway Ghosal | Navonil Majumder | Rada Mihalcea | Soujanya Poria
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
How Good Is NLP? A Sober Look at NLP Tasks through the Lens of Social Impact
Zhijing Jin | Geeticka Chauhan | Brian Tse | Mrinmaya Sachan | Rada Mihalcea
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Exploring Self-Identified Counseling Expertise in Online Support Forums
Allison Lahnala | Yuntian Zhao | Charles Welch | Jonathan K. Kummerfeld | Lawrence C An | Kenneth Resnicow | Rada Mihalcea | Verónica Pérez-Rosas
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Biased TextRank: Unsupervised Graph-Based Content Extraction
Ashkan Kazemi | Verónica Pérez-Rosas | Rada Mihalcea
Proceedings of the 28th International Conference on Computational Linguistics

We introduce Biased TextRank, a graph-based content extraction method inspired by the popular TextRank algorithm that ranks text spans according to their importance for language processing tasks and according to their relevance to an input “focus.” Biased TextRank enables focused content extraction for text by modifying the random restarts in the execution of TextRank. The random restart probabilities are assigned based on the relevance of the graph nodes to the focus of the task. We present two applications of Biased TextRank: focused summarization and explanation extraction, and show that our algorithm leads to improved performance on two different datasets by significant ROUGE-N score margins. Much like its predecessor, Biased TextRank is unsupervised, easy to implement and orders of magnitude faster and lighter than current state-of-the-art Natural Language Processing methods for similar tasks.

pdf bib
“Judge me by my size (noun), do you?” YodaLib: A Demographic-Aware Humor Generation Framework
Aparna Garimella | Carmen Banea | Nabil Hossain | Rada Mihalcea
Proceedings of the 28th International Conference on Computational Linguistics

The subjective nature of humor makes computerized humor generation a challenging task. We propose an automatic humor generation framework for filling the blanks in Mad Libs® stories, while accounting for the demographic backgrounds of the desired audience. We collect a dataset consisting of such stories, which are filled in and judged by carefully selected workers on Amazon Mechanical Turk. We build upon the BERT platform to predict location-biased word fillings in incomplete sentences, and we fine-tune BERT to classify location-specific humor in a sentence. We leverage these components to produce YodaLib, a fully-automated Mad Libs style humor generation framework, which selects and ranks appropriate candidate words and sentences in order to generate a coherent and funny story tailored to certain demographics. Our experimental results indicate that YodaLib outperforms a previous semi-automated approach proposed for this task, while also surpassing human annotators in both qualitative and quantitative analyses.

pdf bib
Exploring the Value of Personalized Word Embeddings
Charles Welch | Jonathan K. Kummerfeld | Verónica Pérez-Rosas | Rada Mihalcea
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we introduce personalized word embeddings, and examine their value for language modeling. We compare the performance of our proposed prediction model when using personalized versus generic word representations, and study how these representations can be leveraged for improved performance. We provide insight into what types of words can be more accurately predicted when building personalized models. Our results show that a subset of words belonging to specific psycholinguistic categories tend to vary more in their representations across users and that combining generic and personalized word embeddings yields the best performance, with a 4.7% relative reduction in perplexity. Additionally, we show that a language model using personalized word embeddings can be effectively used for authorship attribution.

pdf bib
MuSE: a Multimodal Dataset of Stressed Emotion
Mimansa Jaiswal | Cristian-Paul Bara | Yuanhang Luo | Mihai Burzo | Rada Mihalcea | Emily Mower Provost
Proceedings of the 12th Language Resources and Evaluation Conference

Endowing automated agents with the ability to provide support, entertainment and interaction with human beings requires sensing of the users’ affective state. These affective states are impacted by a combination of emotion inducers, current psychological state, and various conversational factors. Although emotion classification in both singular and dyadic settings is an established area, the effects of these additional factors on the production and perception of emotion is understudied. This paper presents a new dataset, Multimodal Stressed Emotion (MuSE), to study the multimodal interplay between the presence of stress and expressions of affect. We describe the data collection protocol, the possible areas of use, and the annotations for the emotional content of the recordings. The paper also presents several baselines to measure the performance of multimodal features for emotion and stress classification.

pdf bib
LifeQA: A Real-life Dataset for Video Question Answering
Santiago Castro | Mahmoud Azab | Jonathan Stroud | Cristina Noujaim | Ruoyao Wang | Jia Deng | Rada Mihalcea
Proceedings of the 12th Language Resources and Evaluation Conference

We introduce LifeQA, a benchmark dataset for video question answering that focuses on day-to-day real-life situations. Current video question answering datasets consist of movies and TV shows. However, it is well-known that these visual domains are not representative of our day-to-day lives. Movies and TV shows, for example, benefit from professional camera movements, clean editing, crisp audio recordings, and scripted dialog between professional actors. While these domains provide a large amount of data for training models, their properties make them unsuitable for testing real-life question answering systems. Our dataset, by contrast, consists of video clips that represent only real-life scenarios. We collect 275 such video clips and over 2.3k multiple-choice questions. In this paper, we analyze the challenging but realistic aspects of LifeQA, and we apply several state-of-the-art video question answering models to provide benchmarks for future research. The full dataset is publicly available at https://lit.eecs.umich.edu/lifeqa/.

pdf bib
Small Town or Metropolis? Analyzing the Relationship between Population Size and Language
Amy Rechkemmer | Steven Wilson | Rada Mihalcea
Proceedings of the 12th Language Resources and Evaluation Conference

The variance in language used by different cultures has been a topic of study for researchers in linguistics and psychology, but often times, language is compared across multiple countries in order to show a difference in culture. As a geographically large country that is diverse in population in terms of the background and experiences of its citizens, the U.S. also contains cultural differences within its own borders. Using a set of over 2 million posts from distinct Twitter users around the country dating back as far as 2014, we ask the following question: is there a difference in how Americans express themselves online depending on whether they reside in an urban or rural area? We categorize Twitter users as either urban or rural and identify ideas and language that are more commonly expressed in tweets written by one population over the other. We take this further by analyzing how the language from specific cities of the U.S. compares to the language of other cities and by training predictive models to predict whether a user is from an urban or rural area. We publicly release the tweet and user IDs that can be used to reconstruct the dataset for future studies in this direction.

pdf bib
Inferring Social Media Users’ Mental Health Status from Multimodal Information
Zhentao Xu | Verónica Pérez-Rosas | Rada Mihalcea
Proceedings of the 12th Language Resources and Evaluation Conference

Worldwide, an increasing number of people are suffering from mental health disorders such as depression and anxiety. In the United States alone, one in every four adults suffers from a mental health condition, which makes mental health a pressing concern. In this paper, we explore the use of multimodal cues present in social media posts to predict users’ mental health status. Specifically, we focus on identifying social media activity that either indicates a mental health condition or its onset. We collect posts from Flickr and apply a multimodal approach that consists of jointly analyzing language, visual, and metadata cues and their relation to mental health. We conduct several classification experiments aiming to discriminate between (1) healthy users and users affected by a mental health illness; and (2) healthy users and users prone to mental illness. Our experimental results indicate that using multiple modalities can improve the performance of this classification task as compared to the use of one modality at a time, and can provide important cues into a user’s mental status.

pdf bib
COSMIC: COmmonSense knowledge for eMotion Identification in Conversations
Deepanway Ghosal | Navonil Majumder | Alexander Gelbukh | Rada Mihalcea | Soujanya Poria
Findings of the Association for Computational Linguistics: EMNLP 2020

In this paper, we address the task of utterance level emotion recognition in conversations using commonsense knowledge. We propose COSMIC, a new framework that incorporates different elements of commonsense such as mental states, events, and causal relations, and build upon them to learn interactions between interlocutors participating in a conversation. Current state-of-theart methods often encounter difficulties in context propagation, emotion shift detection, and differentiating between related emotion classes. By learning distinct commonsense representations, COSMIC addresses these challenges and achieves new state-of-the-art results for emotion recognition on four different benchmark conversational datasets. Our code is available at https://github.com/declare-lab/conv-emotion.

pdf bib
KinGDOM: Knowledge-Guided DOMain Adaptation for Sentiment Analysis
Deepanway Ghosal | Devamanyu Hazarika | Abhinaba Roy | Navonil Majumder | Rada Mihalcea | Soujanya Poria
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Cross-domain sentiment analysis has received significant attention in recent years, prompted by the need to combat the domain gap between different applications that make use of sentiment analysis. In this paper, we take a novel perspective on this task by exploring the role of external commonsense knowledge. We introduce a new framework, KinGDOM, which utilizes the ConceptNet knowledge graph to enrich the semantics of a document by providing both domain-specific and domain-general background concepts. These concepts are learned by training a graph convolutional autoencoder that leverages inter-domain concepts in a domain-invariant manner. Conditioning a popular domain-adversarial baseline method with these learned concepts helps improve its performance over state-of-the-art approaches, demonstrating the efficacy of our proposed framework.

pdf bib
Counseling-Style Reflection Generation Using Generative Pretrained Transformers with Augmented Context
Siqi Shen | Charles Welch | Rada Mihalcea | Verónica Pérez-Rosas
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

We introduce a counseling dialogue system that seeks to assist counselors while they are learning and refining their counseling skills. The system generates counselors’reflections – i.e., responses that reflect back on what the client has said given the dialogue history. Our method builds upon the new generative pretrained transformer architecture and enhances it with context augmentation techniques inspired by traditional strategies used during counselor training. Through a set of comparative experiments, we show that the system that incorporates these strategies performs better in the reflection generation task than a system that is just fine-tuned with counseling conversations. To confirm our findings, we present a human evaluation study that shows that our system generates naturally-looking reflections that are also stylistically and grammatically correct.

pdf bib
Compositional Demographic Word Embeddings
Charles Welch | Jonathan K. Kummerfeld | Verónica Pérez-Rosas | Rada Mihalcea
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Word embeddings are usually derived from corpora containing text from many individuals, thus leading to general purpose representations rather than individually personalized representations. While personalized embeddings can be useful to improve language model performance and other language processing tasks, they can only be computed for people with a large amount of longitudinal data, which is not the case for new users. We propose a new form of personalized word embeddings that use demographic-specific word representations derived compositionally from full or partial demographic information for a user (i.e., gender, age, location, religion). We show that the resulting demographic-aware word representations outperform generic word representations on two tasks for English: language modeling and word associations. We further explore the trade-off between the number of available attributes and their relative effectiveness and discuss the ethical implications of using them.

pdf bib
Improving Low Compute Language Modeling with In-Domain Embedding Initialisation
Charles Welch | Rada Mihalcea | Jonathan K. Kummerfeld
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Many NLP applications, such as biomedical data and technical support, have 10-100 million tokens of in-domain data and limited computational resources for learning from it. How should we train a language model in this scenario? Most language modeling research considers either a small dataset with a closed vocabulary (like the standard 1 million token Penn Treebank), or the whole web with byte-pair encoding. We show that for our target setting in English, initialising and freezing input embeddings using in-domain data can improve language model performance by providing a useful representation of rare words, and this pattern holds across several different domains. In the process, we show that the standard convention of tying input and output embeddings does not improve perplexity when initializing with embeddings trained on in-domain data.

pdf bib
MIME: MIMicking Emotions for Empathetic Response Generation
Navonil Majumder | Pengfei Hong | Shanshan Peng | Jiankun Lu | Deepanway Ghosal | Alexander Gelbukh | Rada Mihalcea | Soujanya Poria
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Current approaches to empathetic response generation view the set of emotions expressed in the input text as a flat structure, where all the emotions are treated uniformly. We argue that empathetic responses often mimic the emotion of the user to a varying degree, depending on its positivity or negativity and content. We show that the consideration of these polarity-based emotion clusters and emotional mimicry results in improved empathy and contextual relevance of the response as compared to the state-of-the-art. Also, we introduce stochasticity into the emotion mixture that yields emotionally more varied empathetic responses than the previous work. We demonstrate the importance of these factors to empathetic response generation using both automatic- and human-based evaluations. The implementation of MIME is publicly available at https://github.com/declare-lab/MIME.

pdf bib
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020
Karin Verspoor | Kevin Bretonnel Cohen | Michael Conway | Berry de Bruijn | Mark Dredze | Rada Mihalcea | Byron Wallace
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

pdf bib
Expressive Interviewing: A Conversational System for Coping with COVID-19
Charles Welch | Allison Lahnala | Veronica Perez-Rosas | Siqi Shen | Sarah Seraj | Larry An | Kenneth Resnicow | James Pennebaker | Rada Mihalcea
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The ongoing COVID-19 pandemic has raised concerns for many regarding personal and public health implications, financial security and economic stability. Alongside many other unprecedented challenges, there are increasing concerns over social isolation and mental health. We introduce Expressive Interviewing – an interview-style conversational system that draws on ideas from motivational interviewing and expressive writing. Expressive Interviewing seeks to encourage users to express their thoughts and feelings through writing by asking them questions about how COVID-19 has impacted their lives. We present relevant aspects of the system’s design and implementation as well as quantitative and qualitative analyses of user interactions with the system. In addition, we conduct a comparative evaluation with a general purpose dialogue system for mental health that shows our system potential in helping users to cope with COVID-19 issues.

pdf bib
Quantifying the Effects of COVID-19 on Mental Health Support Forums
Laura Biester | Katie Matton | Janarthanan Rajendran | Emily Mower Provost | Rada Mihalcea
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The COVID-19 pandemic, like many of the disease outbreaks that have preceded it, is likely to have a profound effect on mental health. Understanding its impact can inform strategies for mitigating negative consequences. In this work, we seek to better understand the effects of COVID-19 on mental health by examining discussions within mental health support communities on Reddit. First, we quantify the rate at which COVID-19 is discussed in each community, or subreddit, in order to understand levels of pandemic-related discussion. Next, we examine the volume of activity in order to determine whether the number of people discussing mental health has risen. Finally, we analyze how COVID-19 has influenced language use and topics of discussion within each subreddit.

pdf bib
Building Location Embeddings from Physical Trajectories and Textual Representations
Laura Biester | Carmen Banea | Rada Mihalcea
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Word embedding methods have become the de-facto way to represent words, having been successfully applied to a wide array of natural language processing tasks. In this paper, we explore the hypothesis that embedding methods can also be effectively used to represent spatial locations. Using a new dataset consisting of the location trajectories of 729 students over a seven month period and text data related to those locations, we implement several strategies to create location embeddings, which we then use to create embeddings of the sequences of locations a student has visited. To identify the surface level properties captured in the representations, we propose a number of probing tasks such as the presence of a specific location in a sequence or the type of activities that take place at a location. We then leverage the representations we generated and employ them in more complex downstream tasks ranging from predicting a student’s area of study to a student’s depression level, showing the effectiveness of these location embeddings.

2019

pdf bib
Towards Extracting Medical Family History from Natural Language Interactions: A New Dataset and Baselines
Mahmoud Azab | Stephane Dadian | Vivi Nastase | Larry An | Rada Mihalcea
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We introduce a new dataset consisting of natural language interactions annotated with medical family histories, obtained during interactions with a genetic counselor and through crowdsourcing, following a questionnaire created by experts in the domain. We describe the data collection process and the annotations performed by medical professionals, including illness and personal attributes (name, age, gender, family relationships) for the patient and their family members. An initial system that performs argument identification and relation extraction shows promising results – average F-score of 0.87 on complex sentences on the targeted relations.

pdf bib
Box of Lies: Multimodal Deception Detection in Dialogues
Felix Soldner | Verónica Pérez-Rosas | Rada Mihalcea
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Deception often takes place during everyday conversations, yet conversational dialogues remain largely unexplored by current work on automatic deception detection. In this paper, we address the task of detecting multimodal deceptive cues during conversational dialogues. We introduce a multimodal dataset containing deceptive conversations between participants playing the Box of Lies game from The Tonight Show Starring Jimmy Fallon, in which they try to guess whether an object description provided by their opponent is deceptive or not. We conduct annotations of multimodal communication behaviors, including facial and linguistic behaviors, and derive several learning features based on these annotations. Initial classification experiments show promising results, performing well above both a random and a human baseline, and reaching up to 69% accuracy in distinguishing deceptive and truthful behaviors.

pdf bib
Representing Movie Characters in Dialogues
Mahmoud Azab | Noriyuki Kojima | Jia Deng | Rada Mihalcea
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We introduce a new embedding model to represent movie characters and their interactions in a dialogue by encoding in the same representation the language used by these characters as well as information about the other participants in the dialogue. We evaluate the performance of these new character embeddings on two tasks: (1) character relatedness, using a dataset we introduce consisting of a dense character interaction matrix for 4,378 unique character pairs over 22 hours of dialogue from eighteen movies; and (2) character relation classification, for fine- and coarse-grained relations, as well as sentiment relations. Our experiments show that our model significantly outperforms the traditional Word2Vec continuous bag-of-words and skip-gram models, demonstrating the effectiveness of the character embeddings we introduce. We further show how these embeddings can be used in conjunction with a visual question answering system to improve over previous results.

pdf bib
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)
Rada Mihalcea | Ekaterina Shutova | Lun-Wei Ku | Kilian Evang | Soujanya Poria
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

pdf bib
Multi-Label Transfer Learning for Multi-Relational Semantic Similarity
Li Zhang | Steven Wilson | Rada Mihalcea
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Multi-relational semantic similarity datasets define the semantic relations between two short texts in multiple ways, e.g., similarity, relatedness, and so on. Yet, all the systems to date designed to capture such relations target one relation at a time. We propose a multi-label transfer learning approach based on LSTM to make predictions for several relations simultaneously and aggregate the losses to update the parameters. This multi-label regression approach jointly learns the information provided by the multiple relations, rather than treating them as separate tasks. Not only does this approach outperform the single-task approach and the traditional multi-task learning approach, but it also achieves state-of-the-art performance on all but one relation of the Human Activity Phrase dataset.

pdf bib
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations
Soujanya Poria | Devamanyu Hazarika | Navonil Majumder | Gautam Naik | Erik Cambria | Rada Mihalcea
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Emotion recognition in conversations is a challenging task that has recently gained popularity due to its potential applications. Until now, however, a large-scale multimodal multi-party emotional conversational database containing more than two speakers per dialogue was missing. Thus, we propose the Multimodal EmotionLines Dataset (MELD), an extension and enhancement of EmotionLines. MELD contains about 13,000 utterances from 1,433 dialogues from the TV-series Friends. Each utterance is annotated with emotion and sentiment labels, and encompasses audio, visual and textual modalities. We propose several strong multimodal baselines and show the importance of contextual and multimodal information for emotion recognition in conversations. The full dataset is available for use at http://affective-meld.github.io.

pdf bib
What Makes a Good Counselor? Learning to Distinguish between High-quality and Low-quality Counseling Conversations
Verónica Pérez-Rosas | Xinyi Wu | Kenneth Resnicow | Rada Mihalcea
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The quality of a counseling intervention relies highly on the active collaboration between clients and counselors. In this paper, we explore several linguistic aspects of the collaboration process occurring during counseling conversations. Specifically, we address the differences between high-quality and low-quality counseling. Our approach examines participants’ turn-by-turn interaction, their linguistic alignment, the sentiment expressed by speakers during the conversation, as well as the different topics being discussed. Our results suggest important language differences in low- and high-quality counseling, which we further use to derive linguistic features able to capture the differences between the two groups. These features are then used to build automatic classifiers that can predict counseling quality with accuracies of up to 88%.

pdf bib
Predicting Human Activities from User-Generated Content
Steven Wilson | Rada Mihalcea
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The activities we do are linked to our interests, personality, political preferences, and decisions we make about the future. In this paper, we explore the task of predicting human activities from user-generated content. We collect a dataset containing instances of social media users writing about a range of everyday activities. We then use a state-of-the-art sentence embedding framework tailored to recognize the semantics of human activities and perform an automatic clustering of these activities. We train a neural network model to make predictions about which clusters contain activities that were performed by a given user based on the text of their previous posts and self-description. Additionally, we explore the degree to which incorporating inferred user traits into our model helps with this prediction task.

pdf bib
Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing
Aparna Garimella | Carmen Banea | Dirk Hovy | Rada Mihalcea
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Several linguistic studies have shown the prevalence of various lexical and grammatical patterns in texts authored by a person of a particular gender, but models for part-of-speech tagging and dependency parsing have still not adapted to account for these differences. To address this, we annotate the Wall Street Journal part of the Penn Treebank with the gender information of the articles’ authors, and build taggers and parsers trained on this data that show performance differences in text written by men and women. Further analyses reveal numerous part-of-speech tags and syntactic relations whose prediction performances benefit from the prevalence of a specific gender in the training data. The results underscore the importance of accounting for gendered differences in syntactic tasks, and outline future venues for developing more accurate taggers and parsers. We release our data to the research community.

pdf bib
Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper)
Santiago Castro | Devamanyu Hazarika | Verónica Pérez-Rosas | Roger Zimmermann | Rada Mihalcea | Soujanya Poria
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Sarcasm is often expressed through several verbal and non-verbal cues, e.g., a change of tone, overemphasis in a word, a drawn-out syllable, or a straight looking face. Most of the recent work in sarcasm detection has been carried out on textual data. In this paper, we argue that incorporating multimodal cues can improve the automatic classification of sarcasm. As a first step towards enabling the development of multimodal approaches for sarcasm detection, we propose a new sarcasm dataset, Multimodal Sarcasm Detection Dataset (MUStARD), compiled from popular TV shows. MUStARD consists of audiovisual utterances annotated with sarcasm labels. Each utterance is accompanied by its context of historical utterances in the dialogue, which provides additional information on the scenario where the utterance occurs. Our initial results show that the use of multimodal information can reduce the relative error rate of sarcasm detection by up to 12.9% in F-score when compared to the use of individual modalities. The full dataset is publicly available for use at https://github.com/soujanyaporia/MUStARD.

pdf bib
Identifying Visible Actions in Lifestyle Vlogs
Oana Ignat | Laura Burdick | Jia Deng | Rada Mihalcea
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We consider the task of identifying human actions visible in online videos. We focus on the widely spread genre of lifestyle vlogs, which consist of videos of people performing actions while verbally describing them. Our goal is to identify if actions mentioned in the speech description of a video are visually present. We construct a dataset with crowdsourced manual annotations of visible actions, and introduce a multimodal algorithm that leverages information derived from visual and linguistic clues to automatically infer which actions are visible in a video.

2018

pdf bib
CASCADE: Contextual Sarcasm Detection in Online Discussion Forums
Devamanyu Hazarika | Soujanya Poria | Sruthi Gorantla | Erik Cambria | Roger Zimmermann | Rada Mihalcea
Proceedings of the 27th International Conference on Computational Linguistics

The literature in automated sarcasm detection has mainly focused on lexical-, syntactic- and semantic-level analysis of text. However, a sarcastic sentence can be expressed with contextual presumptions, background and commonsense knowledge. In this paper, we propose a ContextuAl SarCasm DEtector (CASCADE), which adopts a hybrid approach of both content- and context-driven modeling for sarcasm detection in online social media discussions. For the latter, CASCADE aims at extracting contextual information from the discourse of a discussion thread. Also, since the sarcastic nature and form of expression can vary from person to person, CASCADE utilizes user embeddings that encode stylometric and personality features of users. When used along with content-based feature extractors such as convolutional neural networks, we see a significant boost in the classification performance on a large Reddit corpus.

pdf bib
Automatic Detection of Fake News
Verónica Pérez-Rosas | Bennett Kleinberg | Alexandra Lefevre | Rada Mihalcea
Proceedings of the 27th International Conference on Computational Linguistics

The proliferation of misleading information in everyday access media outlets such as social media feeds, news blogs, and online newspapers have made it challenging to identify trustworthy news sources, thus increasing the need for computational tools able to provide insights into the reliability of online content. In this paper, we focus on the automatic identification of fake content in online news. Our contribution is twofold. First, we introduce two novel datasets for the task of fake news detection, covering seven different news domains. We describe the collection, annotation, and validation process in detail and present several exploratory analyses on the identification of linguistic differences in fake and legitimate news content. Second, we conduct a set of learning experiments to build accurate fake news detectors, and show that we can achieve accuracies of up to 76%. In addition, we provide comparative analyses of the automatic and manual identification of fake news.

pdf bib
Factors Influencing the Surprising Instability of Word Embeddings
Laura Wendlandt | Jonathan K. Kummerfeld | Rada Mihalcea
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this paper, we consider one aspect of embedding spaces, namely their stability. We show that even relatively high frequency words (100-200 occurrences) are often unstable. We provide empirical evidence for how various factors contribute to the stability of word embeddings, and we analyze the effects of stability on downstream tasks.

pdf bib
Speaker Naming in Movies
Mahmoud Azab | Mingzhe Wang | Max Smith | Noriyuki Kojima | Jia Deng | Rada Mihalcea
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We propose a new model for speaker naming in movies that leverages visual, textual, and acoustic modalities in an unified optimization framework. To evaluate the performance of our model, we introduce a new dataset consisting of six episodes of the Big Bang Theory TV show and eighteen full movies covering different genres. Our experiments show that our multimodal model significantly outperforms several competitive baselines on the average weighted F-score metric. To demonstrate the effectiveness of our framework, we design an end-to-end memory network model that leverages our speaker naming model and achieves state-of-the-art results on the subtitles task of the MovieQA 2017 Challenge.

pdf bib
World Knowledge for Abstract Meaning Representation Parsing
Charles Welch | Jonathan K. Kummerfeld | Song Feng | Rada Mihalcea
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Analyzing the Quality of Counseling Conversations: the Tell-Tale Signs of High-quality Counseling
Verónica Pérez-Rosas | Xuetong Sun | Christy Li | Yuchen Wang | Kenneth Resnicow | Rada Mihalcea
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection
Devamanyu Hazarika | Soujanya Poria | Rada Mihalcea | Erik Cambria | Roger Zimmermann
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Emotion recognition in conversations is crucial for building empathetic machines. Present works in this domain do not explicitly consider the inter-personal influences that thrive in the emotional dynamics of dialogues. To this end, we propose Interactive COnversational memory Network (ICON), a multimodal emotion detection framework that extracts multimodal features from conversational videos and hierarchically models the self- and inter-speaker emotional influences into global memories. Such memories generate contextual summaries which aid in predicting the emotional orientation of utterance-videos. Our model outperforms state-of-the-art networks on multiple classification and regression tasks in two benchmark datasets.

2017

pdf bib
Understanding and Predicting Empathic Behavior in Counseling Therapy
Verónica Pérez-Rosas | Rada Mihalcea | Kenneth Resnicow | Satinder Singh | Lawrence An
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Counselor empathy is associated with better outcomes in psychology and behavioral counseling. In this paper, we explore several aspects pertaining to counseling interaction dynamics and their relation to counselor empathy during motivational interviewing encounters. Particularly, we analyze aspects such as participants’ engagement, participants’ verbal and nonverbal accommodation, as well as topics being discussed during the conversation, with the final goal of identifying linguistic and acoustic markers of counselor empathy. We also show how we can use these findings alongside other raw linguistic and acoustic features to build accurate counselor empathy classifiers with accuracies of up to 80%.

pdf bib
Demographic-aware word associations
Aparna Garimella | Carmen Banea | Rada Mihalcea
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Variations of word associations across different groups of people can provide insights into people’s psychologies and their world views. To capture these variations, we introduce the task of demographic-aware word associations. We build a new gold standard dataset consisting of word association responses for approximately 300 stimulus words, collected from more than 800 respondents of different gender (male/female) and from different locations (India/United States), and show that there are significant variations in the word associations made by these groups. We also introduce a new demographic-aware word association model based on a neural net skip-gram architecture, and show how computational methods for measuring word associations that specifically account for writer demographics can outperform generic methods that are agnostic to such information.

pdf bib
Identifying Usage Expression Sentences in Consumer Product Reviews
Shibamouli Lahiri | V.G.Vinod Vydiswaran | Rada Mihalcea
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper we introduce the problem of identifying usage expression sentences in a consumer product review. We create a human-annotated gold standard dataset of 565 reviews spanning five distinct product categories. Our dataset consists of more than 3,000 annotated sentences. We further introduce a classification system to label sentences according to whether or not they describe some “usage”. The system combines lexical, syntactic, and semantic features in a product-agnostic fashion to yield good classification performance. We show the effectiveness of our approach using importance ranking of features, error analysis, and cross-product classification experiments.

pdf bib
Measuring Semantic Relations between Human Activities
Steven Wilson | Rada Mihalcea
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The things people do in their daily lives can provide valuable insights into their personality, values, and interests. Unstructured text data on social media platforms are rich in behavioral content, and automated systems can be deployed to learn about human activity on a broad scale if these systems are able to reason about the content of interest. In order to aid in the evaluation of such systems, we introduce a new phrase-level semantic textual similarity dataset comprised of human activity phrases, providing a testbed for automated systems that analyze relationships between phrasal descriptions of people’s actions. Our set of 1,000 pairs of activities is annotated by human judges across four relational dimensions including similarity, relatedness, motivational alignment, and perceived actor congruence. We evaluate a set of strong baselines for the task of generating scores that correlate highly with human ratings, and we introduce several new approaches to the phrase-level similarity task in the domain of human activities.

pdf bib
Identity Deception Detection
Verónica Pérez-Rosas | Quincy Davenport | Anna Mengdan Dai | Mohamed Abouelenien | Rada Mihalcea
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This paper addresses the task of detecting identity deception in language. Using a novel identity deception dataset, consisting of real and portrayed identities from 600 individuals, we show that we can build accurate identity detectors targeting both age and gender, with accuracies of up to 88. We also perform an analysis of the linguistic patterns used in identity deception, which lead to interesting insights into identity portrayers.

pdf bib
Predicting Counselor Behaviors in Motivational Interviewing Encounters
Verónica Pérez-Rosas | Rada Mihalcea | Kenneth Resnicow | Satinder Singh | Lawrence An | Kathy J. Goggin | Delwyn Catley
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

As the number of people receiving psycho-therapeutic treatment increases, the automatic evaluation of counseling practice arises as an important challenge in the clinical domain. In this paper, we address the automatic evaluation of counseling performance by analyzing counselors’ language during their interaction with clients. In particular, we present a model towards the automation of Motivational Interviewing (MI) coding, which is the current gold standard to evaluate MI counseling. First, we build a dataset of hand labeled MI encounters; second, we use text-based methods to extract and analyze linguistic patterns associated with counselor behaviors; and third, we develop an automatic system to predict these behaviors. We introduce a new set of features based on semantic information and syntactic patterns, and show that they lead to accuracy figures of up to 90%, which represent a significant improvement with respect to features used in the past.

pdf bib
A Computational Analysis of the Language of Drug Addiction
Carlo Strapparava | Rada Mihalcea
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We present a computational analysis of the language of drug users when talking about their drug experiences. We introduce a new dataset of over 4,000 descriptions of experiences reported by users of four main drug types, and show that we can predict with an F1-score of up to 88% the drug behind a certain experience. We also perform an analysis of the dominant psycholinguistic processes and dominant emotions associated with each drug type, which sheds light on the characteristics of drug users.

2016

pdf bib
Finding Optimists and Pessimists on Twitter
Xianzhi Ruan | Steven Wilson | Rada Mihalcea
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Building a Dataset for Possessions Identification in Text
Carmen Banea | Xi Chen | Rada Mihalcea
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Just as industrialization matured from mass production to customization and personalization, so has the Web migrated from generic content to public disclosures of one’s most intimately held thoughts, opinions and beliefs. This relatively new type of data is able to represent finer and more narrowly defined demographic slices. If until now researchers have primarily focused on leveraging personalized content to identify latent information such as gender, nationality, location, or age of the author, this study seeks to establish a structured way of extracting possessions, or items that people own or are entitled to, as a way to ultimately provide insights into people’s behaviors and characteristics. In order to promote more research in this area, we are releasing a set of 798 possessions extracted from blog genre, where possessions are marked at different confidence levels, as well as a detailed set of guidelines to help in future annotation studies.

pdf bib
Building a Motivational Interviewing Dataset
Verónica Pérez-Rosas | Rada Mihalcea | Kenneth Resnicow | Satinder Singh | Lawrence An
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib
Zooming in on Gender Differences in Social Media
Aparna Garimella | Rada Mihalcea
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Men are from Mars and women are from Venus - or so the genre of relationship literature would have us believe. But there is some truth in this idea, and researchers in fields as diverse as psychology, sociology, and linguistics have explored ways to better understand the differences between genders. In this paper, we take another look at the problem of gender discrimination and attempt to move beyond the typical surface-level text classification approach, by (1) identifying semantic and psycholinguistic word classes that reflect systematic differences between men and women and (2) finding differences between genders in the ways they use the same words. We describe several experiments and report results on a large collection of blogs authored by men and women.

pdf bib
Disentangling Topic Models: A Cross-cultural Analysis of Personal Values through Words
Steven Wilson | Rada Mihalcea | Ryan Boyd | James Pennebaker
Proceedings of the First Workshop on NLP and Computational Social Science

pdf bib
Identifying Cross-Cultural Differences in Word Usage
Aparna Garimella | Rada Mihalcea | James Pennebaker
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Personal writings have inspired researchers in the fields of linguistics and psychology to study the relationship between language and culture to better understand the psychology of people across different cultures. In this paper, we explore this relation by developing cross-cultural word models to identify words with cultural bias – i.e., words that are used in significantly different ways by speakers from different cultures. Focusing specifically on two cultures: United States and Australia, we identify a set of words with significant usage differences, and further investigate these words through feature analysis and topic modeling, shedding light on the attributes of language that contribute to these differences.

pdf bib
Targeted Sentiment to Understand Student Comments
Charles Welch | Rada Mihalcea
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We address the task of targeted sentiment as a means of understanding the sentiment that students hold toward courses and instructors, as expressed by students in their comments. We introduce a new dataset consisting of student comments annotated for targeted sentiment and describe a system that can both identify the courses and instructors mentioned in student comments, as well as label the students’ sentiment toward those entities. Through several comparative evaluations, we show that our system outperforms previous work on a similar task.

pdf bib
SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
Eneko Agirre | Carmen Banea | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre | Rada Mihalcea | German Rigau | Janyce Wiebe
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Experiments in Open Domain Deception Detection
Verónica Pérez-Rosas | Rada Mihalcea
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Verbal and Nonverbal Clues for Real-life Deception Detection
Verónica Pérez-Rosas | Mohamed Abouelenien | Rada Mihalcea | Yao Xiao | CJ Linton | Mihai Burzo
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Co-Training for Topic Classification of Scholarly Data
Cornelia Caragea | Florin Bulgarov | Rada Mihalcea
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability
Eneko Agirre | Carmen Banea | Claire Cardie | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre | Weiwei Guo | Iñigo Lopez-Gazpio | Montse Maritxalar | Rada Mihalcea | German Rigau | Larraitz Uria | Janyce Wiebe
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Rada Mihalcea | Joyce Chai | Anoop Sarkar
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Using Word Semantics To Assist English as a Second Language Learners
Mahmoud Azab | Chris Hokamp | Rada Mihalcea
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

2014

pdf bib
Iterative Constrained Clustering for Subjectivity Word Sense Disambiguation
Cem Akkaya | Janyce Wiebe | Rada Mihalcea
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
Eneko Agirre | Carmen Banea | Claire Cardie | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre | Weiwei Guo | Rada Mihalcea | German Rigau | Janyce Wiebe
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
SimCompass: Using Deep Learning Word Embeddings to Assess Cross-level Similarity
Carmen Banea | Di Chen | Rada Mihalcea | Claire Cardie | Janyce Wiebe
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Building a Dataset for Summarization and Keyword Extraction from Emails
Vanessa Loza | Shibamouli Lahiri | Rada Mihalcea | Po-Hsiang Lai
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper introduces a new email dataset, consisting of both single and thread emails, manually annotated with summaries and keywords. A total of 349 emails and threads have been annotated. The dataset is our first step toward developing automatic methods for summarization and keyword extraction from emails. We describe the email corpus, along with the annotation interface, annotator guidelines, and agreement studies.

pdf bib
Modeling Language Proficiency Using Implicit Feedback
Chris Hokamp | Rada Mihalcea | Peter Schuelke
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We describe the results of several experiments with interactive interfaces for native and L2 English students, designed to collect implicit feedback from students as they complete a reading activity. In this study, implicit means that all data is obtained without asking the user for feedback. To test the value of implicit feedback for assessing student proficiency, we collect features of user behavior and interaction, which are then used to train classification models. Based upon the feedback collected during these experiments, a student’s performance on a quiz and proficiency relative to other students can be accurately predicted, which is a step on the path to our goal of providing automatic feedback and unintrusive evaluation in interactive learning environments.

pdf bib
A Multimodal Dataset for Deception Detection
Verónica Pérez-Rosas | Rada Mihalcea | Alexis Narvaez | Mihai Burzo
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents the construction of a multimodal dataset for deception detection, including physiological, thermal, and visual responses of human subjects under three deceptive scenarios. We present the experimental protocol, as well as the data acquisition process. To evaluate the usefulness of the dataset for the task of deception detection, we present a statistical analysis of the physiological and thermal modalities associated with the deceptive and truthful conditions. Initial results show that physiological and thermal responses can differentiate between deceptive and truthful states.

pdf bib
Cross-cultural Deception Detection
Verónica Pérez-Rosas | Rada Mihalcea
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Multilingual Word Sense Disambiguation Using Wikipedia
Bharath Dandala | Rada Mihalcea | Razvan Bunescu
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Using N-gram and Word Network Features for Native Language Identification
Shibamouli Lahiri | Rada Mihalcea
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Utterance-Level Multimodal Sentiment Analysis
Verónica Pérez-Rosas | Rada Mihalcea | Louis-Philippe Morency
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Sense Clustering Using Wikipedia
Bharath Dandala | Chris Hokamp | Rada Mihalcea | Razvan Bunescu
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Coarse to Fine Grained Sense Disambiguation in Wikipedia
Hui Shen | Razvan Bunescu | Rada Mihalcea
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

pdf bib
CPN-CORE: A Text Semantic Similarity System Infused with Opinion Knowledge
Carmen Banea | Yoonjung Choi | Lingjia Deng | Samer Hassan | Michael Mohler | Bishan Yang | Claire Cardie | Rada Mihalcea | Jan Wiebe
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

2012

pdf bib
Measuring Semantic Relatedness using Multilingual Representations
Samer Hassan | Carmen Banea | Rada Mihalcea
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Towards Building a Multilingual Semantic Network: Identifying Interlingual Links in Wikipedia
Bharath Dandala | Rada Mihalcea | Razvan Bunescu
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
SemEval-2012 Task 1: English Lexical Simplification
Lucia Specia | Sujay Kumar Jauhar | Rada Mihalcea
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
UNT: A Supervised Synergistic Approach to Semantic Text Similarity
Carmen Banea | Samer Hassan | Michael Mohler | Rada Mihalcea
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Lyrics, Music, and Emotions
Rada Mihalcea | Carlo Strapparava
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Multilingual Natural Language Processing
Rada Mihalcea
Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data

pdf bib
Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature
David Elson | Anna Kazantseva | Rada Mihalcea | Stan Szpakowicz
Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature

pdf bib
Multimodal Sentiment Analysis
Rada Mihalcea
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

pdf bib
A Parallel Corpus of Music and Lyrics Annotated with Emotions
Carlo Strapparava | Rada Mihalcea | Alberto Battocchi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, we introduce a novel parallel corpus of music and lyrics, annotated with emotions at line level. We first describe the corpus, consisting of 100 popular songs, each of them including a music component, provided in the MIDI format, as well as a lyrics component, made available as raw text. We then describe our work on enhancing this corpus with emotion annotations using crowdsourcing. We also present some initial experiments on emotion classification using the music and the lyrics representations of the songs, which lead to encouraging results, thus demonstrating the promise of using joint music-lyric models for song processing.

pdf bib
Learning Sentiment Lexicons in Spanish
Verónica Pérez-Rosas | Carmen Banea | Rada Mihalcea
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we present a framework to derive sentiment lexicons in a target language by using manually or automatically annotated data available in an electronic resource rich language, such as English. We show that bridging the language gap using the multilingual sense-level aligned WordNet structure allows us to generate a high accuracy (90%) polarity lexicon comprising 1,347 entries, and a disjoint lower accuracy (74%) one encompassing 2,496 words. By using an LSA-based vectorial expansion for the generated lexicons, we are able to obtain an average F-measure of 66% in the target language. This implies that the lexicons could be used to bootstrap higher-coverage lexicons using in-language resources.

pdf bib
Unsupervised Word Sense Disambiguation with Multilingual Representations
Erwin Fernandez-Ordoñez | Rada Mihalcea | Samer Hassan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we investigate the role of multilingual features in improving word sense disambiguation. In particular, we explore the use of semantic clues derived from context translation to enrich the intended sense and therefore reduce ambiguity. Our experiments demonstrate up to 26% increase in disambiguation accuracy by utilizing multilingual features as compared to the monolingual baseline.

pdf bib
Sense and Reference Disambiguation in Wikipedia
Hui Shen | Razvan Bunescu | Rada Mihalcea
Proceedings of COLING 2012: Posters

pdf bib
Word Epoch Disambiguation: Finding How Words Change Over Time
Rada Mihalcea | Vivi Nastase
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Multilingual Subjectivity and Sentiment Analysis
Rada Mihalcea | Carmen Banea | Janyce Wiebe
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2011

pdf bib
Word Sense Disambiguation with Multilingual Features
Carmen Banea | Rada Mihalcea
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

pdf bib
Measuring the semantic relatedness between words and images
Chee Wee Leong | Rada Mihalcea
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

pdf bib
Improving the Impact of Subjectivity Word Sense Disambiguation on Contextual Opinion Analysis
Cem Akkaya | Janyce Wiebe | Alexander Conrad | Rada Mihalcea
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

pdf bib
Topic Modeling on Historical Newspapers
Tze-I Yang | Andrew Torget | Rada Mihalcea
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Sense-level Subjectivity in a Multilingual Setting
Carmen Banea | Rada Mihalcea | Janyce Wiebe
Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011)

pdf bib
Going Beyond Text: A Hybrid Image-Text Approach for Measuring Word Relatedness
Chee Wee Leong | Rada Mihalcea
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Dekang Lin | Yuji Matsumoto | Rada Mihalcea
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments
Michael Mohler | Razvan Bunescu | Rada Mihalcea
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Dekang Lin | Yuji Matsumoto | Rada Mihalcea
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
An Efficient Indexer for Large N-Gram Corpora
Hakan Ceylan | Rada Mihalcea
Proceedings of the ACL-HLT 2011 System Demonstrations

2010

pdf bib
SemEval-2010 Task 2: Cross-Lingual Lexical Substitution
Rada Mihalcea | Ravi Sinha | Diana McCarthy
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Quantifying the Limits and Success of Extractive Summarization Systems Across Domains
Hakan Ceylan | Rada Mihalcea | Umut Özertem | Elena Lloret | Manuel Palomar
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Amazon Mechanical Turk for Subjectivity Word Sense Disambiguation
Cem Akkaya | Alexander Conrad | Janyce Wiebe | Rada Mihalcea
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf bib
Cross Language Text Classification by Model Translation and Semi-Supervised Learning
Lei Shi | Rada Mihalcea | Mingjun Tian
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multilingual Subjectivity: Are More Languages Better?
Carmen Banea | Rada Mihalcea | Janyce Wiebe
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Text Mining for Automatic Image Tagging
Chee Wee Leong | Rada Mihalcea | Samer Hassan
Coling 2010: Posters

2009

pdf bib
The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language
Rada Mihalcea | Carlo Strapparava
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Text-to-Text Semantic Similarity for Automatic Short Answer Grading
Michael Mohler | Rada Mihalcea
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
Philipp Koehn | Rada Mihalcea
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Subjectivity Word Sense Disambiguation
Cem Akkaya | Janyce Wiebe | Rada Mihalcea
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge
Samer Hassan | Rada Mihalcea
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Integrating Knowledge for Subjectivity Sense Labeling
Yaw Gyamfi | Janyce Wiebe | Rada Mihalcea | Cem Akkaya
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Topic Identification Using Wikipedia Graph Centrality
Kino Coursey | Rada Mihalcea
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Using Encyclopedic Knowledge for Automatic Topic Identification
Kino Coursey | Rada Mihalcea | William Moen
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf bib
SemEval-2010 Task 2: Cross-Lingual Lexical Substitution
Ravi Sinha | Diana McCarthy | Rada Mihalcea
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

pdf bib
Explorations in Automatic Image Annotation using Textual Features
Chee Wee Leong | Rada Mihalcea
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
Learning to Identify Educational Materials
Samer Hassan | Rada Mihalcea
Proceedings of the International Conference RANLP-2009

pdf bib
Combining Lexical Resources for Contextual Synonym Expansion
Ravi Sinha | Rada Mihalcea
Proceedings of the International Conference RANLP-2009

2008

pdf bib
Multilingual Subjectivity Analysis Using Machine Translation
Carmen Banea | Rada Mihalcea | Janyce Wiebe | Samer Hassan
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
How to Add a New Language on the NLP Map: Building Resources and Tools for Languages with Scarce Resources
Rada Mihalcea | Vivi Nastase
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Babylon Parallel Text Builder: Gathering Parallel Texts for Low-Density Languages
Michael Mohler | Rada Mihalcea
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes Babylon, a system that attempts to overcome the shortage of parallel texts in low-density languages by supplementing existing parallel texts with texts gathered automatically from the Web. In addition to the identification of entire Web pages, we also propose a new feature specifically designed to find parallel text chunks within a single document. Experiments carried out on the Quechua-Spanish language pair show that the system is successful in automatically identifying a significant amount of parallel texts on the Web. Evaluations of a machine translation system trained on this corpus indicate that the Web-gathered parallel texts can supplement manually compiled parallel texts and perform significantly better than the manually compiled texts when tested on other Web-gathered data.

pdf bib
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources
Carmen Banea | Rada Mihalcea | Janyce Wiebe
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper introduces a method for creating a subjectivity lexicon for languages with scarce resources. The method is able to build a subjectivity lexicon by using a small seed set of subjective words, an online dictionary, and a small raw corpus, coupled with a bootstrapping process that ranks new candidate words based on a similarity measure. Experiments performed with a rule-based sentence level subjectivity classifier show an 18% absolute improvement in F-measure as compared to previously proposed semi-supervised methods.

pdf bib
Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger
Rada Mihalcea
Computational Linguistics, Volume 34, Number 1, March 2008

pdf bib
Linguistically Motivated Features for Enhanced Back-of-the-Book Indexing
Andras Csomai | Rada Mihalcea
Proceedings of ACL-08: HLT

2007

pdf bib
Using Wikipedia for Automatic Word Sense Disambiguation
Rada Mihalcea
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing
Chris Biemann | Irina Matveeva | Rada Mihalcea | Dragomir Radev
Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing

pdf bib
SemEval-2007 Task 14: Affective Text
Carlo Strapparava | Rada Mihalcea
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
UNT-Yahoo: SuperSenseLearner: Combining SenseLearner with SuperSense and other Coarse Semantic Features
Rada Mihalcea | Andras Csomai | Massimiliano Ciaramita
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
UNT: SubFinder: Combining Knowledge Sources for Automatic Lexical Substitution
Samer Hassan | Andras Csomai | Carmen Banea | Ravi Sinha | Rada Mihalcea
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
Explorations in Automatic Book Summarization
Rada Mihalcea | Hakan Ceylan
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Learning Multilingual Subjective Language via Cross-Lingual Projections
Rada Mihalcea | Carmen Banea | Janyce Wiebe
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf bib
Word Sense and Subjectivity
Janyce Wiebe | Rada Mihalcea
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing
Rada Mihalcea | Dragomir Radev
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing

pdf bib
Toward Communicating Simple Sentences Using Pictorial Representations
Rada Mihalcea | Ben Leong
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper evaluates the hypothesis that pictorial representations can be used to effectively convey simple sentences across language barriers. Comparative evaluations show that a considerable amount of understanding can be achieved using visual descriptions of information, with evaluation figures within a comparable range of those obtained with linguistic representations produced by an automatic machine translation system.

pdf bib
Graph-based Algorithms for Natural Language Processing and Information Retrieval
Rada Mihalcea | Dragomir Radev
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

2005

pdf bib
Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling
Rada Mihalcea
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Making Computers Laugh: Investigations in Automatic Humor Recognition
Rada Mihalcea | Carlo Strapparava
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the ACL Workshop on Building and Using Parallel Texts
Philipp Koehn | Joel Martin | Rada Mihalcea | Christof Monz | Ted Pedersen
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf bib
Word Alignment for Languages with Scarce Resources
Joel Martin | Rada Mihalcea | Ted Pedersen
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf bib
Measuring the Semantic Similarity of Texts
Courtney Corley | Rada Mihalcea
Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment

pdf bib
Language Independent Extractive Summarization
Rada Mihalcea
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
SenseLearner: Word Sense Disambiguation for All Words in Unrestricted Text
Rada Mihalcea | Andras Csomai
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
A Language Independent Algorithm for Single and Multiple Document Summarization
Rada Mihalcea | Paul Tarau
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

2004

pdf bib
The Senseval-3 Multilingual English-Hindi lexical sample task
Timothy Chklovski | Rada Mihalcea | Ted Pedersen | Amruta Purandare
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
The Senseval-3 English lexical sample task
Rada Mihalcea | Timothy Chklovski | Adam Kilgarriff
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
An evaluation exercise for Romanian Word Sense Disambiguation
Rada Mihalcea | Vivi Năstase | Timothy Chklovski | Doina Tătar | Dan Tufiş | Florentina Hristea
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
SenseLearner: Minimally supervised Word Sense Disambiguation for all words in open text
Rada Mihalcea | Ehsanul Faruque
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
An algorithm for open text semantic parsing
Lei Shi | Rada Mihalcea
Proceedings of the 3rd workshop on RObust Methods in Analysis of Natural Language Data (ROMAND 2004)

pdf bib
Co-training and Self-training for Word Sense Disambiguation
Rada Mihalcea
Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004

pdf bib
TextRank: Bringing Order into Text
Rada Mihalcea | Paul Tarau
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

pdf bib
Open Text Semantic Parsing Using FrameNet and WordNet
Lei Shi | Rada Mihalcea
Demonstration Papers at HLT-NAACL 2004

pdf bib
Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization
Rada Mihalcea
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
Finding Semantic Associations on Express Lane
Vivi Năstase | Rada Mihalcea
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
PageRank on Semantic Networks, with Application to Word Sense Disambiguation
Rada Mihalcea | Paul Tarau | Elizabeth Figa
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
An Evaluation Exercise for Word Alignment
Rada Mihalcea | Ted Pedersen
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
Open Mind Word Expert: Creating Large Annotated Data Collections with Web Users’ Help
Rada Mihalcea | Timothy Chklovski
Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003

2002

pdf bib
Bootstrapping Large Sense Tagged Corpora
Rada F. Mihalcea
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Building a Sense Tagged Corpus with Open Mind Word Expert
Timothy Chklovski | Rada Mihalcea
Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions

pdf bib
Letter Level Learning for Language Independent Diacritics Restoration
Rada Mihalcea | Vivi Nastase
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)

pdf bib
Instance Based Learning with Automatic Feature Selection Applied to Word Sense Disambiguation
Rada Mihalcea
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
The Role of Lexico-Semantic Feedback in Open-Domain Textual Question-Answering
Sanda Harabagiu | Dan Moldovan | Marius Pasca | Rada Mihalcea | Mihai Surdeanu | Razvan Bunsecu | Roxana Girju | Vasile Rus | Paul Morarescu
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

pdf bib
Pattern Learning and Active Feature Selection for Word Sense Disambiguation
Rada F. Mihalcea | Dan I. Moldovan
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

2000

pdf bib
The Structure and Performance of an Open-Domain Question Answering System
Dan Moldovan | Sanda Harabagiu | Marius Pasca | Rada Mihalcea | Roxana Girju | Richard Goodrum | Vasile Rus
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf bib
Semantic Indexing using WordNet Senses
Rada Mihalcea | Dan Moldovan
ACL-2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval

1999

pdf bib
A Method for Word Sense Disambiguation of Unrestricted Text
Rada Mihalcea | Dan I. Moldovan
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1998

pdf bib
Word Sense Disambiguation based on Semantic Density
Rada Mihalcea | Dan I. Moldovan
Usage of WordNet in Natural Language Processing Systems

Search
Co-authors