Ritwik Banerjee


2024

pdf bib
Paying Attention to Deflections: Mining Pragmatic Nuances for Whataboutism Detection in Online Discourse
Khiem Phi | Noushin Salek Faramarzi | Chenlu Wang | Ritwik Banerjee
Findings of the Association for Computational Linguistics: ACL 2024

Whataboutism, a potent tool for disrupting narratives and sowing distrust, remains under-explored in quantitative NLP research. Moreover, past work has not distinguished its use as a strategy for misinformation and propaganda from its use as a tool for pragmatic and semantic framing. We introduce new datasets from Twitter/X and YouTube, revealing overlaps as well as distinctions between whataboutism, propaganda, and the tu quoque fallacy. Furthermore, drawing on recent work in linguistic semantics, we differentiate the ‘what about’ lexical construct from whataboutism. Our experiments bring to light unique challenges in its accurate detection, prompting the introduction of a novel method using attention weights for negative sample mining. We report significant improvements of 4% and 10% over previous state-of-the-art methods in our Twitter and YouTube collections, respectively.

2023

pdf bib
Context-aware Medication Event Extraction from Unstructured Text
Noushin Salek Faramarzi | Meet Patel | Sai Harika Bandarupally | Ritwik Banerjee
Proceedings of the 5th Clinical Natural Language Processing Workshop

Accurately capturing medication history is crucial in delivering high-quality medical care. The extraction of medication events from unstructured clinical notes, however, is challenging because the information is presented in complex narratives. We address this challenge by leveraging the newly released Contextualized Medication Event Dataset (CMED) as part of our participation in the 2022 National NLP Clinical Challenges (n2c2) shared task. Our study evaluates the performance of various pretrained language models in this task. Further, we find that data augmentation coupled with domain-specific training provides notable improvements. With experiments, we also underscore the importance of careful data preprocessing in medical event detection.

2021

pdf bib
An Empirical Assessment of the Qualitative Aspects of Misinformation in Health News
Chaoyuan Zuo | Qi Zhang | Ritwik Banerjee
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

The explosion of online health news articles runs the risk of the proliferation of low-quality information. Within the existing work on fact-checking, however, relatively little attention has been paid to medical news. We present a health news classification task to determine whether medical news articles satisfy a set of review criteria deemed important by medical experts and health care journalists. We present a dataset of 1,119 health news paired with systematic reviews. The review criteria consist of six elements that are essential to the accuracy of medical news. We then present experiments comparing the classical token-based approach with the more recent transformer-based models. Our results show that detecting qualitative lapses is a challenging task with direct ramifications in misinformation, but is an important direction to pursue beyond assigning True or False labels to short claims.

pdf bib
An Investigation into the Contribution of Locally Aggregated Descriptors to Figurative Language Identification
Sina Mahdipour Saravani | Ritwik Banerjee | Indrakshi Ray
Proceedings of the Second Workshop on Insights from Negative Results in NLP

In natural language understanding, topics that touch upon figurative language and pragmatics are notably difficult. We probe a novel use of locally aggregated descriptors – specifically, an architecture called NeXtVLAD – motivated by its accomplishments in computer vision, achieve tremendous success in the FigLang2020 sarcasm detection task. The reported F1 score of 93.1% is 14% higher than the next best result. We specifically investigate the extent to which the novel architecture is responsible for this boost, and find that it does not provide statistically significant benefits. Deep learning approaches are expensive, and we hope our insights highlighting the lack of benefits from introducing a resource-intensive component will aid future research to distill the effective elements from long and complex pipelines, thereby providing a boost to the wider research community.

2020

pdf bib
Querying Across Genres for Medical Claims in News
Chaoyuan Zuo | Narayan Acharya | Ritwik Banerjee
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present a query-based biomedical information retrieval task across two vastly different genres – newswire and research literature – where the goal is to find the research publication that supports the primary claim made in a health-related news article. For this task, we present a new dataset of 5,034 claims from news paired with research abstracts. Our approach consists of two steps: (i) selecting the most relevant candidates from a collection of 222k research abstracts, and (ii) re-ranking this list. We compare the classical IR approach using BM25 with more recent transformer-based models. Our results show that cross-genre medical IR is a viable task, but incorporating domain-specific knowledge is crucial.

2014

pdf bib
Keystroke Patterns as Prosody in Digital Writings: A Case Study with Deceptive Reviews and Essays
Ritwik Banerjee | Song Feng | Jun Seok Kang | Yejin Choi
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2012

pdf bib
Syntactic Stylometry for Deception Detection
Song Feng | Ritwik Banerjee | Yejin Choi
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Characterizing Stylistic Elements in Syntactic Structure
Song Feng | Ritwik Banerjee | Yejin Choi
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning