Debanjan Ghosh


2024

pdf bib
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024)
Debanjan Ghosh | Smaranda Muresan | Anna Feldman | Tuhin Chakrabarty | Emmy Liu
Proceedings of the 4th Workshop on Figurative Language Processing (FigLang 2024)

pdf bib
Identifying Fairness Issues in Automatically Generated Testing Content
Kevin Stowe | Benny Longwill | Alyssa Francis | Tatsuya Aoyama | Debanjan Ghosh | Swapna Somasundaran
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Natural language generation tools are powerful and effective for generating content. However, language models are known to display bias and fairness issues, making them impractical to deploy for many use cases. We here focus on how fairness issues impact automatically generated test content, which can have stringent requirements to ensure the test measures only what it was intended to measure. Specifically, we review test content generated for a large-scale standardized English proficiency test with the goal of identifying content that only pertains to a certain subset of the test population as well as content that has the potential to be upsetting or distracting to some test takers. Issues like these could inadvertently impact a test taker’s score and thus should be avoided. This kind of content does not reflect the more commonly-acknowledged biases, making it challenging even for modern models that contain safeguards. We build a dataset of 601 generated texts annotated for fairness and explore a variety of methods for classification: fine-tuning, topic-based classification, and prompting, including few-shot and self-correcting prompts. We find that combining prompt self-correction and few-shot learning performs best, yielding an F1 score of 0.79 on our held-out test set, while much smaller BERT- and topic-based models have competitive performance on out-of-domain data.

2023

pdf bib
The Benefits of Label-Description Training for Zero-Shot Text Classification
Lingyu Gao | Debanjan Ghosh | Kevin Gimpel
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Pretrained language models have improved zero-shot text classification by allowing the transfer of semantic knowledge from the training data in order to classify among specific label sets in downstream tasks. We propose a simple way to further improve zero-shot accuracies with minimal effort. We curate small finetuning datasets intended to describe the labels for a task. Unlike typical finetuning data, which has texts annotated with labels, our data simply describes the labels in language, e.g., using a few related terms, dictionary/encyclopedia entries, and short templates. Across a range of topic and sentiment datasets, our method is more accurate than zero-shot by 17-19% absolute. It is also more robust to choices required for zero-shot classification, such as patterns for prompting the model to classify and mappings from labels to tokens in the model’s vocabulary. Furthermore, since our data merely describes the labels but does not use input texts, finetuning on it yields a model that performs strongly on multiple text domains for a given label set, even improving over few-shot out-of-domain classification in multiple settings.

2022

pdf bib
FLUTE: Figurative Language Understanding through Textual Explanations
Tuhin Chakrabarty | Arkadiy Saakyan | Debanjan Ghosh | Smaranda Muresan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Figurative language understanding has been recently framed as a recognizing textual entailment (RTE) task (a.k.a. natural language inference (NLI)). However, similar to classical RTE/NLI datasets they suffer from spurious correlations and annotation artifacts. To tackle this problem, work on NLI has built explanation-based datasets such as eSNLI, allowing us to probe whether language models are right for the right reasons. Yet no such data exists for figurative language, making it harder to assess genuine understanding of such expressions. To address this issue, we release FLUTE, a dataset of 9,000 figurative NLI instances with explanations, spanning four categories: Sarcasm, Simile, Metaphor, and Idioms. We collect the data through a Human-AI collaboration framework based on GPT-3, crowd workers, and expert annotators. We show how utilizing GPT-3 in conjunction with human annotators (novices and experts) can aid in scaling up the creation of datasets even for such complex linguistic phenomena as figurative language. The baseline performance of the T5 model fine-tuned on FLUTE shows that our dataset can bring us a step closer to developing models that understand figurative language through textual explanations.

pdf bib
AGReE: A system for generating Automated Grammar Reading Exercises
Sophia Chan | Swapna Somasundaran | Debanjan Ghosh | Mengxuan Zhao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We describe the AGReE system, which takes user-submitted passages as input and automatically generates grammar practice exercises that can be completed while reading. Multiple-choice practice items are generated for a variety of different grammar constructs: punctuation, articles, conjunctions, pronouns, prepositions, verbs, and nouns. We also conducted a large-scale human evaluation with around 4,500 multiple-choice practice items. We notice for 95% of items, a majority of raters out of five were able to identify the correct answer, for 85% of cases, raters agree that there is only one correct answer among the choices. Finally, the error analysis shows that raters made the most mistakes for punctuation and conjunctions.

pdf bib
Controlled Language Generation for Language Learning Items
Kevin Stowe | Debanjan Ghosh | Mengxuan Zhao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

This work aims to employ natural language generation (NLG) to rapidly generate items for English language learning applications: this requires both language models capable of generating fluent, high-quality English, and to control the output of the generation to match the requirements of the relevant items. We experiment with deep pretrained models for this task, developing novel methods for controlling items for factors relevant in language learning: diverse sentences for different proficiency levels and argument structure to test grammar. Human evaluation demonstrates high grammatically scores for all models (3.4 and above out of 4), and higher length (24%) and complexity (9%) over the baseline for the advanced proficiency model. Our results show that we can achieve strong performance while adding additional control to ensure diverse, tailored content for individual users.

pdf bib
“What makes a question inquisitive?” A Study on Type-Controlled Inquisitive Question Generation
Lingyu Gao | Debanjan Ghosh | Kevin Gimpel
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

We propose a type-controlled framework for inquisitive question generation. We annotate an inquisitive question dataset with question types, train question type classifiers, and finetune models for type-controlled question generation. Empirical results demonstrate that we can generate a variety of questions that adhere to specific types while drawing from the source texts. We also investigate strategies for selecting a single question from a generated set, considering both an informative vs. inquisitive question classifier and a pairwise ranker trained from a small set of expert annotations. Question selection using the pairwise ranker yields strong results in automatic and manual evaluation. Our human evaluation assesses multiple aspects of the generated questions, finding that the ranker chooses questions with the best syntax (4.59), semantics (4.37), and inquisitiveness (3.92) on a scale of 1-5, even rivaling the performance of human-written questions.

pdf bib
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)
Debanjan Ghosh | Beata Beigman Klebanov | Smaranda Muresan | Anna Feldman | Soujanya Poria | Tuhin Chakrabarty
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

pdf bib
A Report on the FigLang 2022 Shared Task on Understanding Figurative Language
Arkadiy Saakyan | Tuhin Chakrabarty | Debanjan Ghosh | Smaranda Muresan
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)

We present the results of the Shared Task on Understanding Figurative Language that we conducted as a part of the 3rd Workshop on Figurative Language Processing (FigLang 2022) at EMNLP 2022. The shared task is based on the FLUTE dataset (Chakrabarty et al., 2022), which consists of NLI pairs containing figurative language along with free text explanations for each NLI instance. The task challenged participants to build models that are able to not only predict the right label for a figurative NLI instance, but also generate a convincing free-text explanation. The participants were able to significantly improve upon provided baselines in both automatic and human evaluation settings. We further summarize the submitted systems and discuss the evaluation results.

2021

pdf bib
“Sharks are not the threat humans are”: Argument Component Segmentation in School Student Essays
Tariq Alhindi | Debanjan Ghosh
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

Argument mining is often addressed by a pipeline method where segmentation of text into argumentative units is conducted first and proceeded by an argument component identification task. In this research, we apply a token-level classification to identify claim and premise tokens from a new corpus of argumentative essays written by middle school students. To this end, we compare a variety of state-of-the-art models such as discrete features and deep learning architectures (e.g., BiLSTM networks and BERT-based architectures) to identify the argument components. We demonstrate that a BERT-based multi-task learning architecture (i.e., token and sentence level classification) adaptively pretrained on a relevant unlabeled dataset obtains the best results.

pdf bib
“Laughing at you or with you”: The Role of Sarcasm in Shaping the Disagreement Space
Debanjan Ghosh | Ritvik Shrivastava | Smaranda Muresan
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Detecting arguments in online interactions is useful to understand how conflicts arise and get resolved. Users often use figurative language, such as sarcasm, either as persuasive devices or to attack the opponent by an ad hominem argument. To further our understanding of the role of sarcasm in shaping the disagreement space, we present a thorough experimental setup using a corpus annotated with both argumentative moves (agree/disagree) and sarcasm. We exploit joint modeling in terms of (a) applying discrete features that are useful in detecting sarcasm to the task of argumentative relation classification (agree/disagree/none), and (b) multitask learning for argumentative relation classification and sarcasm detection using deep learning architectures (e.g., dual Long Short-Term Memory (LSTM) with hierarchical attention and Transformer-based architectures). We demonstrate that modeling sarcasm improves the argumentative relation classification task (agree/disagree/none) in all setups.

pdf bib
Figurative Language in Recognizing Textual Entailment
Tuhin Chakrabarty | Debanjan Ghosh | Adam Poliak | Smaranda Muresan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
An Exploratory Study of Argumentative Writing by Young Students: A transformer-based Approach
Debanjan Ghosh | Beata Beigman Klebanov | Yi Song
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

We present a computational exploration of argument critique writing by young students. Middle school students were asked to criticize an argument presented in the prompt, focusing on identifying and explaining the reasoning flaws. This task resembles an established college-level argument critique task. Lexical and discourse features that utilize detailed domain knowledge to identify critiques exist for the college task but do not perform well on the young students’ data. Instead, transformer-based architecture (e.g., BERT) fine-tuned on a large corpus of critique essays from the college task performs much better (over 20% improvement in F1 score). Analysis of the performance of various configurations of the system suggests that while children’s writing does not exhibit the standard discourse structure of an argumentative essay, it does share basic local sequential structures with the more mature writers.

pdf bib
Rˆ3: Reverse, Retrieve, and Rank for Sarcasm Generation with Commonsense Knowledge
Tuhin Chakrabarty | Debanjan Ghosh | Smaranda Muresan | Nanyun Peng
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose an unsupervised approach for sarcasm generation based on a non-sarcastic input sentence. Our method employs a retrieve-and-edit framework to instantiate two major characteristics of sarcasm: reversal of valence and semantic incongruity with the context, which could include shared commonsense or world knowledge between the speaker and the listener. While prior works on sarcasm generation predominantly focus on context incongruity, we show that combining valence reversal and semantic incongruity based on the commonsense knowledge generates sarcasm of higher quality. Human evaluation shows that our system generates sarcasm better than humans 34% of the time, and better than a reinforced hybrid baseline 90% of the time.

pdf bib
Interpreting Verbal Irony: Linguistic Strategies and the Connection to theType of Semantic Incongruity
Debanjan Ghosh | Elena Musi | Smaranda Muresan
Proceedings of the Society for Computation in Linguistics 2020

pdf bib
Proceedings of the Second Workshop on Figurative Language Processing
Beata Beigman Klebanov | Ekaterina Shutova | Patricia Lichtenstein | Smaranda Muresan | Chee Wee | Anna Feldman | Debanjan Ghosh
Proceedings of the Second Workshop on Figurative Language Processing

pdf bib
A Report on the 2020 Sarcasm Detection Shared Task
Debanjan Ghosh | Avijit Vajpayee | Smaranda Muresan
Proceedings of the Second Workshop on Figurative Language Processing

Detecting sarcasm and verbal irony is critical for understanding people’s actual sentiments and beliefs. Thus, the field of sarcasm analysis has become a popular research problem in natural language processing. As the community working on computational approaches for sarcasm detection is growing, it is imperative to conduct benchmarking studies to analyze the current state-of-the-art, facilitating progress in this area. We report on the shared task on sarcasm detection we conducted as a part of the 2nd Workshop on Figurative Language Processing (FigLang 2020) at ACL 2020.

2018

pdf bib
Sarcasm Analysis Using Conversation Context
Debanjan Ghosh | Alexander R. Fabbri | Smaranda Muresan
Computational Linguistics, Volume 44, Issue 4 - December 2018

Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, the speaker’s sarcastic intent is not always apparent without additional context. Focusing on social media discussions, we investigate three issues: (1) does modeling conversation context help in sarcasm detection? (2) can we identify what part of conversation context triggered the sarcastic reply? and (3) given a sarcastic post that contains multiple sentences, can we identify the specific sentence that is sarcastic? To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the current turn. We show that LSTM networks with sentence-level attention on context and current turn, as well as the conditional LSTM network, outperform the LSTM model that reads only the current turn. As conversation context, we consider the prior turn, the succeeding turn, or both. Our computational models are tested on two types of social media platforms: Twitter and discussion forums. We discuss several differences between these data sets, ranging from their size to the nature of the gold-label annotations. To address the latter two issues, we present a qualitative analysis of the attention weights produced by the LSTM models (with attention) and discuss the results compared with human performance on the two tasks.

2017

pdf bib
The Role of Conversation Context for Sarcasm Detection in Online Interactions
Debanjan Ghosh | Alexander Richard Fabbri | Smaranda Muresan
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, speaker’s sarcastic intent is not always obvious without additional context. Focusing on social media discussions, we investigate two issues: (1) does modeling of conversation context help in sarcasm detection and (2) can we understand what part of conversation context triggered the sarcastic reply. To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the sarcastic response. We show that the conditional LSTM network (Rocktäschel et al. 2015) and LSTM networks with sentence level attention on context and response outperform the LSTM model that reads only the response. To address the second issue, we present a qualitative analysis of attention weights produced by the LSTM models with attention and discuss the results compared with human performance on the task.

2016

pdf bib
Towards Feasible Guidelines for the Annotation of Argument Schemes
Elena Musi | Debanjan Ghosh | Smaranda Muresan
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib
Coarse-grained Argumentation Features for Scoring Persuasive Essays
Debanjan Ghosh | Aquila Khanam | Yubo Han | Smaranda Muresan
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Sarcastic or Not: Word Embeddings to Predict the Literal or Sarcastic Meaning of Words
Debanjan Ghosh | Weiwei Guo | Smaranda Muresan
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Analyzing Argumentative Discourse Units in Online Interactions
Debanjan Ghosh | Smaranda Muresan | Nina Wacholder | Mark Aakhus | Matthew Mitsui
Proceedings of the First Workshop on Argumentation Mining

pdf bib
Annotating Multiparty Discourse: Challenges for Agreement Metrics
Nina Wacholder | Smaranda Muresan | Debanjan Ghosh | Mark Aakhus
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

2012

pdf bib
Relation Classification using Entity Sequence Kernels
Debanjan Ghosh | Smaranda Muresan
Proceedings of COLING 2012: Posters

2011

pdf bib
Using Sequence Kernels to identify Opinion Entities in Urdu
Smruthi Mukund | Debanjan Ghosh | Rohini Srihari
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

2010

pdf bib
Using Cross-Lingual Projections to Generate Semantic Role Labeled Annotated Corpus for Urdu - A Resource Poor Language
Smruthi Mukund | Debanjan Ghosh | Rohini Srihari
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)