Tuhin Chakrabarty


pdf bib
It’s not Rocket Science: Interpreting Figurative Language in Narratives
Tuhin Chakrabarty | Yejin Choi | Vered Shwartz
Transactions of the Association for Computational Linguistics, Volume 10

Figurative language is ubiquitous in English. Yet, the vast majority of NLP research focuses on literal language. Existing text representations by design rely on compositionality, while figurative language is often non- compositional. In this paper, we study the interpretation of two non-compositional figurative languages (idioms and similes). We collected datasets of fictional narratives containing a figurative expression along with crowd-sourced plausible and implausible continuations relying on the correct interpretation of the expression. We then trained models to choose or generate the plausible continuation. Our experiments show that models based solely on pre-trained language models perform substantially worse than humans on these tasks. We additionally propose knowledge-enhanced models, adopting human strategies for interpreting figurative language types: inferring meaning from the context and relying on the constituent words’ literal meanings. The knowledge-enhanced models improve the performance on both the discriminative and generative tasks, further bridging the gap from human performance.


pdf bib
COVID-Fact: Fact Extraction and Verification of Real-World Claims on COVID-19 Pandemic
Arkadiy Saakyan | Tuhin Chakrabarty | Smaranda Muresan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We introduce a FEVER-like dataset COVID-Fact of 4,086 claims concerning the COVID-19 pandemic. The dataset contains claims, evidence for the claims, and contradictory claims refuted by the evidence. Unlike previous approaches, we automatically detect true claims and their source articles and then generate counter-claims using automatic methods rather than employing human annotators. Along with our constructed resource, we formally present the task of identifying relevant evidence for the claims and verifying whether the evidence refutes or supports a given claim. In addition to scientific claims, our data contains simplified general claims from media sources, making it better suited for detecting general misinformation regarding COVID-19. Our experiments indicate that COVID-Fact will provide a challenging testbed for the development of new systems and our approach will reduce the costs of building domain-specific datasets for detecting misinformation.

pdf bib
Metaphor Generation with Conceptual Mappings
Kevin Stowe | Tuhin Chakrabarty | Nanyun Peng | Smaranda Muresan | Iryna Gurevych
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Generating metaphors is a difficult task as it requires understanding nuanced relationships between abstract concepts. In this paper, we aim to generate a metaphoric sentence given a literal expression by replacing relevant verbs. Guided by conceptual metaphor theory, we propose to control the generation process by encoding conceptual mappings between cognitive domains to generate meaningful metaphoric expressions. To achieve this, we develop two methods: 1) using FrameNet-based embeddings to learn mappings between domains and applying them at the lexical level (CM-Lex), and 2) deriving source/target pairs to train a controlled seq-to-seq generation model (CM-BART). We assess our methods through automatic and human evaluation for basic metaphoricity and conceptual metaphor presence. We show that the unsupervised CM-Lex model is competitive with recent deep learning metaphor generation systems, and CM-BART outperforms all other models both in automatic and human evaluations.

pdf bib
Figurative Language in Recognizing Textual Entailment
Tuhin Chakrabarty | Debanjan Ghosh | Adam Poliak | Smaranda Muresan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Identifying Distributional Perspectives from Colingual Groups
Yufei Tian | Tuhin Chakrabarty | Fred Morstatter | Nanyun Peng
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media

Discrepancies exist among different cultures or languages. A lack of mutual understanding among different colingual groups about the perspectives on specific values or events may lead to uninformed decisions or biased opinions. Thus, automatically understanding the group perspectives can provide essential back-ground for many natural language processing tasks. In this paper, we study colingual groups and use language corpora as a proxy to identify their distributional perspectives. We present a novel computational approach to learn shared understandings, and benchmark our method by building culturally-aware models for the English, Chinese, and Japanese languages. Ona held out set of diverse topics, including marriage, corruption, democracy, etc., our model achieves high correlation with human judgements regarding intra-group values and inter-group differences

pdf bib
MERMAID: Metaphor Generation with Symbolism and Discriminative Decoding
Tuhin Chakrabarty | Xurui Zhang | Smaranda Muresan | Nanyun Peng
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Generating metaphors is a challenging task as it requires a proper understanding of abstract concepts, making connections between unrelated concepts, and deviating from the literal meaning. In this paper, we aim to generate a metaphoric sentence given a literal expression by replacing relevant verbs. Based on a theoretically-grounded connection between metaphors and symbols, we propose a method to automatically construct a parallel corpus by transforming a large number of metaphorical sentences from the Gutenberg Poetry corpus (CITATION) to their literal counterpart using recent advances in masked language modeling coupled with commonsense inference. For the generation task, we incorporate a metaphor discriminator to guide the decoding of a sequence to sequence model fine-tuned on our parallel data to generate high-quality metaphors. Human evaluation on an independent test set of literal statements shows that our best model generates metaphors better than three well-crafted baselines 66% of the time on average. A task-based evaluation shows that human-written poems enhanced with metaphors proposed by our model are preferred 68% of the time compared to poems without metaphors.

pdf bib
ENTRUST: Argument Reframing with Language Models and Entailment
Tuhin Chakrabarty | Christopher Hidey | Smaranda Muresan
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Framing involves the positive or negative presentation of an argument or issue depending on the audience and goal of the speaker. Differences in lexical framing, the focus of our work, can have large effects on peoples’ opinions and beliefs. To make progress towards reframing arguments for positive effects, we create a dataset and method for this task. We use a lexical resource for “connotations” to create a parallel corpus and propose a method for argument reframing that combines controllable text generation (positive connotation) with a post-decoding entailment component (same denotation). Our results show that our method is effective compared to strong baselines along the dimensions of fluency, meaning, and trustworthiness/reduction of fear.

pdf bib
DiSCoL: Toward Engaging Dialogue Systems through Conversational Line Guided Response Generation
Sarik Ghazarian | Zixi Liu | Tuhin Chakrabarty | Xuezhe Ma | Aram Galstyan | Nanyun Peng
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

Having engaging and informative conversations with users is the utmost goal for open-domain conversational systems. Recent advances in transformer-based language models and their applications to dialogue systems have succeeded to generate fluent and human-like responses. However, they still lack control over the generation process towards producing contentful responses and achieving engaging conversations. To achieve this goal, we present DiSCoL (Dialogue Systems through Coversational Line guided response generation). DiSCoL is an open-domain dialogue system that leverages conversational lines (briefly convlines) as controllable and informative content-planning elements to guide the generation model produce engaging and informative responses. Two primary modules in DiSCoL’s pipeline are conditional generators trained for 1) predicting relevant and informative convlines for dialogue contexts and 2) generating high-quality responses conditioned on the predicted convlines. Users can also change the returned convlines to control the direction of the conversations towards topics that are more interesting for them. Through automatic and human evaluations, we demonstrate the efficiency of the convlines in producing engaging conversations.

pdf bib
Implicit Premise Generation with Discourse-aware Commonsense Knowledge Models
Tuhin Chakrabarty | Aadit Trivedi | Smaranda Muresan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Enthymemes are defined as arguments where a premise or conclusion is left implicit. We tackle the task of generating the implicit premise in an enthymeme, which requires not only an understanding of the stated conclusion and premise but also additional inferences that could depend on commonsense knowledge. The largest available dataset for enthymemes (Habernal et al., 2018) consists of 1.7k samples, which is not large enough to train a neural text generation model. To address this issue, we take advantage of a similar task and dataset: Abductive reasoning in narrative text (Bhagavatula et al., 2020). However, we show that simply using a state-of-the-art seq2seq model fine-tuned on this data might not generate meaningful implicit premises associated with the given enthymemes. We demonstrate that encoding discourse-aware commonsense during fine-tuning improves the quality of the generated implicit premises and outperforms all other baselines both in automatic and human evaluations on three different datasets.

pdf bib
Don’t Go Far Off: An Empirical Study on Neural Poetry Translation
Tuhin Chakrabarty | Arkadiy Saakyan | Smaranda Muresan
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Despite constant improvements in machine translation quality, automatic poetry translation remains a challenging problem due to the lack of open-sourced parallel poetic corpora, and to the intrinsic complexities involved in preserving the semantics, style and figurative nature of poetry. We present an empirical investigation for poetry translation along several dimensions: 1) size and style of training data (poetic vs. non-poetic), including a zero-shot setup; 2) bilingual vs. multilingual learning; and 3) language-family-specific models vs. mixed-language-family models. To accomplish this, we contribute a parallel dataset of poetry translations for several language pairs. Our results show that multilingual fine-tuning on poetic text significantly outperforms multilingual fine-tuning on non-poetic text that is 35X larger in size, both in terms of automatic metrics (BLEU, BERTScore, COMET) and human evaluation metrics such as faithfulness (meaning and poetic style). Moreover, multilingual fine-tuning on poetic data outperforms bilingual fine-tuning on poetic data.


pdf bib
Content Planning for Neural Story Generation with Aristotelian Rescoring
Seraphina Goldfarb-Tarrant | Tuhin Chakrabarty | Ralph Weischedel | Nanyun Peng
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Long-form narrative text generated from large language models manages a fluent impersonation of human writing, but only at the local sentence level, and lacks structure or global cohesion. We posit that many of the problems of story generation can be addressed via high-quality content planning, and present a system that focuses on how to learn good plot structures to guide story generation. We utilize a plot-generation language model along with an ensemble of rescoring models that each implement an aspect of good story-writing as detailed in Aristotle’s Poetics. We find that stories written with our more principled plot-structure are both more relevant to a given prompt and higher quality than baselines that do not content plan, or that plan in an unprincipled way.

pdf bib
Generating similes effortlessly like a Pro: A Style Transfer Approach for Simile Generation
Tuhin Chakrabarty | Smaranda Muresan | Nanyun Peng
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Literary tropes, from poetry to stories, are at the crux of human imagination and communication. Figurative language such as a simile go beyond plain expressions to give readers new insights and inspirations. In this paper, we tackle the problem of simile generation. Generating a simile requires proper understanding for effective mapping of properties between two concepts. To this end, we first propose a method to automatically construct a parallel corpus by transforming a large number of similes collected from Reddit to their literal counterpart using structured common sense knowledge. We then propose to fine-tune a pre-trained sequence to sequence model, BART (Lewis et al 2019), on the literal-simile pairs to gain generalizability, so that we can generate novel similes given a literal sentence. Experiments show that our approach generates 88% novel similes that do not share properties with the training data. Human evaluation on an independent set of literal statements shows that our model generates similes better than two literary experts 37% of the time when compared pairwise. We also show how replacing literal sentences with similes from our best model in machine-generated stories improves evocativeness and leads to better acceptance by human judges.

pdf bib
Rˆ3: Reverse, Retrieve, and Rank for Sarcasm Generation with Commonsense Knowledge
Tuhin Chakrabarty | Debanjan Ghosh | Smaranda Muresan | Nanyun Peng
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose an unsupervised approach for sarcasm generation based on a non-sarcastic input sentence. Our method employs a retrieve-and-edit framework to instantiate two major characteristics of sarcasm: reversal of valence and semantic incongruity with the context, which could include shared commonsense or world knowledge between the speaker and the listener. While prior works on sarcasm generation predominantly focus on context incongruity, we show that combining valence reversal and semantic incongruity based on the commonsense knowledge generates sarcasm of higher quality. Human evaluation shows that our system generates sarcasm better than humans 34% of the time, and better than a reinforced hybrid baseline 90% of the time.

pdf bib
DeSePtion: Dual Sequence Prediction and Adversarial Examples for Improved Fact-Checking
Christopher Hidey | Tuhin Chakrabarty | Tariq Alhindi | Siddharth Varia | Kriste Krstovski | Mona Diab | Smaranda Muresan
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The increased focus on misinformation has spurred development of data and systems for detecting the veracity of a claim as well as retrieving authoritative evidence. The Fact Extraction and VERification (FEVER) dataset provides such a resource for evaluating endto- end fact-checking, requiring retrieval of evidence from Wikipedia to validate a veracity prediction. We show that current systems for FEVER are vulnerable to three categories of realistic challenges for fact-checking – multiple propositions, temporal reasoning, and ambiguity and lexical variation – and introduce a resource with these types of claims. Then we present a system designed to be resilient to these “attacks” using multiple pointer networks for document selection and jointly modeling a sequence of evidence sentences and veracity relation predictions. We find that in handling these attacks we obtain state-of-the-art results on FEVER, largely due to improved evidence retrieval.


pdf bib
IMHO Fine-Tuning Improves Claim Detection
Tuhin Chakrabarty | Christopher Hidey | Kathy McKeown
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Claims are the central component of an argument. Detecting claims across different domains or data sets can often be challenging due to their varying conceptualization. We propose to alleviate this problem by fine-tuning a language model using a Reddit corpus of 5.5 million opinionated claims. These claims are self-labeled by their authors using the internet acronyms IMO/IMHO (in my (humble) opinion). Empirical results show that using this approach improves the state of art performance across four benchmark argumentation data sets by an average of 4 absolute F1 points in claim detection. As these data sets include diverse domains such as social media and student essays this improvement demonstrates the robustness of fine-tuning on this novel corpus.

pdf bib
ColumbiaNLP at SemEval-2019 Task 8: The Answer is Language Model Fine-tuning
Tuhin Chakrabarty | Smaranda Muresan
Proceedings of the 13th International Workshop on Semantic Evaluation

Community Question Answering forums are very popular nowadays, as they represent effective means for communities to share information around particular topics. But the information shared on these forums are often not authentic. This paper presents the ColumbiaNLP submission for the SemEval-2019 Task 8: Fact-Checking in Community Question Answering Forums. We show how fine-tuning a language model on a large unannotated corpus of old threads from Qatar Living forum helps us to classify question types (factual, opinion, socializing) and to judge the factuality of answers on the shared task labeled data from the same forum. Our system finished 4th and 2nd on Subtask A (question type classification) and B (answer factuality prediction), respectively, based on the official metric of accuracy.

pdf bib
AMPERSAND: Argument Mining for PERSuAsive oNline Discussions
Tuhin Chakrabarty | Christopher Hidey | Smaranda Muresan | Kathy McKeown | Alyssa Hwang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Argumentation is a type of discourse where speakers try to persuade their audience about the reasonableness of a claim by presenting supportive arguments. Most work in argument mining has focused on modeling arguments in monologues. We propose a computational model for argument mining in online persuasive discussion forums that brings together the micro-level (argument as product) and macro-level (argument as process) models of argumentation. Fundamentally, this approach relies on identifying relations between components of arguments in a discussion thread. Our approach for relation prediction uses contextual information in terms of fine-tuning a pre-trained language model and leveraging discourse relations based on Rhetorical Structure Theory. We additionally propose a candidate selection method to automatically predict what parts of one’s argument will be targeted by other participants in the discussion. Our models obtain significant improvements compared to recent state-of-the-art approaches using pointer networks and a pre-trained language model.

pdf bib
Pay “Attention” to your Context when Classifying Abusive Language
Tuhin Chakrabarty | Kilol Gupta | Smaranda Muresan
Proceedings of the Third Workshop on Abusive Language Online

The goal of any social media platform is to facilitate healthy and meaningful interactions among its users. But more often than not, it has been found that it becomes an avenue for wanton attacks. We propose an experimental study that has three aims: 1) to provide us with a deeper understanding of current data sets that focus on different types of abusive language, which are sometimes overlapping (racism, sexism, hate speech, offensive language, and personal attacks); 2) to investigate what type of attention mechanism (contextual vs. self-attention) is better for abusive language detection using deep learning architectures; and 3) to investigate whether stacked architectures provide an advantage over simple architectures for this task.

pdf bib
Discourse Relation Prediction: Revisiting Word Pairs with Convolutional Networks
Siddharth Varia | Christopher Hidey | Tuhin Chakrabarty
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Word pairs across argument spans have been shown to be effective for predicting the discourse relation between them. We propose an approach to distill knowledge from word pairs for discourse relation classification with convolutional neural networks by incorporating joint learning of implicit and explicit relations. Our novel approach of representing the input as word pairs achieves state-of-the-art results on four-way classification of both implicit and explicit relations as well as one of the binary classification tasks. For explicit relation prediction, we achieve around 20% error reduction on the four-way task. At the same time, compared to a two-layered Bi-LSTM-CRF model, our model is able to achieve these results with half the number of learnable parameters and approximately half the amount of training time.


pdf bib
Robust Document Retrieval and Individual Evidence Modeling for Fact Extraction and Verification.
Tuhin Chakrabarty | Tariq Alhindi | Smaranda Muresan
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

This paper presents the ColumbiaNLP submission for the FEVER Workshop Shared Task. Our system is an end-to-end pipeline that extracts factual evidence from Wikipedia and infers a decision about the truthfulness of the claim based on the extracted evidence. Our pipeline achieves significant improvement over the baseline for all the components (Document Retrieval, Sentence Selection and Textual Entailment) both on the development set and the test set. Our team finished 6th out of 24 teams on the leader-board based on the preliminary results with a FEVER score of 49.06 on the blind test set compared to 27.45 of the baseline system.