Ritwik Banerjee

2025

Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies
Vishnu Raja | Adithya V Ganesan | Anand Syamkumar | Ritwik Banerjee | H. Schwartz
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

State-of-the-art automatic speech recognition (ASR) models like Whisper perform poorly on atypical speech, such as that produced by individuals with dysarthria. Past works for atypical speech have mostly investigated fully personalized (or idiosyncratic) models, but modeling strategies that can both generalize and handle idiosyncrasy could be more effective for capturing atypical speech. To investigate this, we compare four strategies: (a) *normative* models trained on typical speech (no personalization), (b) *idiosyncratic* models completely personalized to individuals, (c) *dysarthric-normative* models trained on other dysarthric speakers, and (d) *dysarthric-idiosyncratic* models which combine strategies by first modeling normative patterns before adapting to individual speech. In this case study, we find the dysarthric-idiosyncratic model performs better than the idiosyncratic approach while requiring less than half as much personalized data (36.43 WER with 128 train size vs. 36.99 with 256). Further, we found that tuning the speech encoder alone (as opposed to the LM decoder) yielded the best results, reducing word error rate from 71% to 32% on average. Our findings highlight the value of leveraging both normative (cross-speaker) and idiosyncratic (speaker-specific) patterns to improve ASR for underrepresented speech populations. [GitHub: VishnuRaja98/Dysarthric-Speech-Transcription](https://github.com/VishnuRaja98/Dysarthric-Speech-Transcription)

pdf bib abs

Class Distillation with Mahalanobis Contrast: An Efficient Training Paradigm for Pragmatic Language Understanding Tasks
Chenlu Wang | Weimin Lyu | Ritwik Banerjee
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Detecting deviant language such as sexism, or nuanced language such as metaphors or sarcasm, is crucial for enhancing the safety, clarity, and interpretation of social interactions. While existing classifiers deliver strong results on these tasks, they often come with significant computational cost and high data demands. In this work, we propose Class Distillation (ClaD), a novel training paradigm that targets the core challenge: distilling a small, well-defined target class from a highly diverse and heterogeneous background. ClaD integrates two key innovations: (i) a loss function informed by the structural properties of class distributions, based on Mahalanobis distance, and (ii) an interpretable decision algorithm optimized for class separation. Across three benchmark detection tasks – sexism, metaphor, and sarcasm – ClaD outperforms competitive baselines, and even with smaller language models and orders of magnitude fewer parameters, achieves performance comparable to several large language models. These results demonstrate ClaD as an efficient tool for pragmatic language understanding tasks that require gleaning a small target class from a larger heterogeneous background.

2024

pdf bib abs

Paying Attention to Deflections: Mining Pragmatic Nuances for Whataboutism Detection in Online Discourse
Khiem Phi | Noushin Salek Faramarzi | Chenlu Wang | Ritwik Banerjee
Findings of the Association for Computational Linguistics: ACL 2024

Whataboutism, a potent tool for disrupting narratives and sowing distrust, remains under-explored in quantitative NLP research. Moreover, past work has not distinguished its use as a strategy for misinformation and propaganda from its use as a tool for pragmatic and semantic framing. We introduce new datasets from Twitter/X and YouTube, revealing overlaps as well as distinctions between whataboutism, propaganda, and the tu quoque fallacy. Furthermore, drawing on recent work in linguistic semantics, we differentiate the ‘what about’ lexical construct from whataboutism. Our experiments bring to light unique challenges in its accurate detection, prompting the introduction of a novel method using attention weights for negative sample mining. We report significant improvements of 4% and 10% over previous state-of-the-art methods in our Twitter and YouTube collections, respectively.

2023

pdf bib abs

Context-aware Medication Event Extraction from Unstructured Text
Noushin Salek Faramarzi | Meet Patel | Sai Harika Bandarupally | Ritwik Banerjee
Proceedings of the 5th Clinical Natural Language Processing Workshop

Accurately capturing medication history is crucial in delivering high-quality medical care. The extraction of medication events from unstructured clinical notes, however, is challenging because the information is presented in complex narratives. We address this challenge by leveraging the newly released Contextualized Medication Event Dataset (CMED) as part of our participation in the 2022 National NLP Clinical Challenges (n2c2) shared task. Our study evaluates the performance of various pretrained language models in this task. Further, we find that data augmentation coupled with domain-specific training provides notable improvements. With experiments, we also underscore the importance of careful data preprocessing in medical event detection.

2021

pdf bib abs

An Investigation into the Contribution of Locally Aggregated Descriptors to Figurative Language Identification
Sina Mahdipour Saravani | Ritwik Banerjee | Indrakshi Ray
Proceedings of the Second Workshop on Insights from Negative Results in NLP

In natural language understanding, topics that touch upon figurative language and pragmatics are notably difficult. We probe a novel use of locally aggregated descriptors – specifically, an architecture called NeXtVLAD – motivated by its accomplishments in computer vision, achieve tremendous success in the FigLang2020 sarcasm detection task. The reported F1 score of 93.1% is 14% higher than the next best result. We specifically investigate the extent to which the novel architecture is responsible for this boost, and find that it does not provide statistically significant benefits. Deep learning approaches are expensive, and we hope our insights highlighting the lack of benefits from introducing a resource-intensive component will aid future research to distill the effective elements from long and complex pipelines, thereby providing a boost to the wider research community.

pdf bib abs

An Empirical Assessment of the Qualitative Aspects of Misinformation in Health News
Chaoyuan Zuo | Qi Zhang | Ritwik Banerjee
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

The explosion of online health news articles runs the risk of the proliferation of low-quality information. Within the existing work on fact-checking, however, relatively little attention has been paid to medical news. We present a health news classification task to determine whether medical news articles satisfy a set of review criteria deemed important by medical experts and health care journalists. We present a dataset of 1,119 health news paired with systematic reviews. The review criteria consist of six elements that are essential to the accuracy of medical news. We then present experiments comparing the classical token-based approach with the more recent transformer-based models. Our results show that detecting qualitative lapses is a challenging task with direct ramifications in misinformation, but is an important direction to pursue beyond assigning True or False labels to short claims.

2020

pdf bib abs

Querying Across Genres for Medical Claims in News
Chaoyuan Zuo | Narayan Acharya | Ritwik Banerjee
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We present a query-based biomedical information retrieval task across two vastly different genres – newswire and research literature – where the goal is to find the research publication that supports the primary claim made in a health-related news article. For this task, we present a new dataset of 5,034 claims from news paired with research abstracts. Our approach consists of two steps: (i) selecting the most relevant candidates from a collection of 222k research abstracts, and (ii) re-ranking this list. We compare the classical IR approach using BM25 with more recent transformer-based models. Our results show that cross-genre medical IR is a viable task, but incorporating domain-specific knowledge is crucial.

Ritwik Banerjee

2025

2024

2023

2021

2020

2014

2012

Co-authors

Venues