Gopendra Vikram Singh


2024

pdf bib
Deciphering Cognitive Distortions in Patient-Doctor Mental Health Conversations: A Multimodal LLM-Based Detection and Reasoning Framework
Gopendra Vikram Singh | Sai Vardhan Vemulapalli | Mauajama Firdaus | Asif Ekbal
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Cognitive distortion research holds increasing significance as it sheds light on pervasive errors in thinking patterns, providing crucial insights into mental health challenges and fostering the development of targeted interventions and therapies. This paper delves into the complex domain of cognitive distortions which are prevalent distortions in cognitive processes often associated with mental health issues. Focusing on patient-doctor dialogues, we introduce a pioneering method for detecting and reasoning about cognitive distortions utilizing Large Language Models (LLMs). Operating within a multimodal context encompassing audio, video, and textual data, our approach underscores the critical importance of integrating diverse modalities for a comprehensive understanding of cognitive distortions. By leveraging multimodal information, including audio, video, and textual data, our method offers a nuanced perspective that enhances the accuracy and depth of cognitive distortion detection and reasoning in a zero-shot manner. Our proposed hierarchical framework adeptly tackles both detection and reasoning tasks, showcasing significant performance enhancements compared to current methodologies. Through comprehensive analysis, we elucidate the efficacy of our approach, offering promising insights into the diagnosis and understanding of cognitive distortions in multimodal settings.The code and dataset can be found here: https://github.com/clang1234/ZS-CoDR.git

2023

pdf bib
Leveraging Empathy, Distress, and Emotion for Accurate Personality Subtyping from Complex Human Textual Responses
Soumitra Ghosh | Tanisha Tiwari | Chetna Painkra | Gopendra Vikram Singh | Asif Ekbal
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Automated personality subtyping is a crucial area of research with diverse applications in psychology, healthcare, and marketing. However, current studies face challenges such as insufficient data, noisy text data, and difficulty in capturing complex personality traits. To address these issues, including empathy, distress, and emotion as auxiliary tasks in automated personality subtyping may enhance accuracy and robustness. This study introduces a Multi-input Multi-task Framework for Personality, Empathy, Distress, and Emotion Detection (MultiPEDE). This framework harnesses the complementary information from empathy, distress, and emotion tasks (auxiliary tasks) to enhance the accuracy and generalizability of automated personality subtyping (the primary task). The model uses a novel deep-learning architecture that captures the interdependencies between these constructs, is end-to-end trainable, and does not rely on ensemble strategies, making it practical for real-world applications. Performance evaluation involves labeled examples of five personality traits, two classes each for personality, empathy, and distress detection, and seven classes for emotion detection. This approach has diverse applications, including mental health diagnosis, improving online services, and aiding job candidate selection.

2022

pdf bib
Are Emoji, Sentiment, and Emotion Friends? A Multi-task Learning for Emoji, Sentiment, and Emotion Analysis
Gopendra Vikram Singh | Dushyant Singh Chauhan | Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf bib
EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in Hindi for Emotion Recognition in Dialogues
Gopendra Vikram Singh | Priyanshu Priya | Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The long-standing goal of Artificial Intelligence (AI) has been to create human-like conversational systems. Such systems should have the ability to develop an emotional connection with the users, consequently, emotion recognition in dialogues has gained popularity. Emotion detection in dialogues is a challenging task because humans usually convey multiple emotions with varying degrees of intensities in a single utterance. Moreover, emotion in an utterance of a dialogue may be dependent on previous utterances making the task more complex. Recently, emotion recognition in low-resource languages like Hindi has been in great demand. However, most of the existing datasets for multi-label emotion and intensity detection in conversations are in English. To this end, we propose a large conversational dataset in Hindi named EmoInHindi for multi-label emotion and intensity recognition in conversations containing 1,814 dialogues with a total of 44,247 utterances. We prepare our dataset in a Wizard-of-Oz manner for mental health and legal counselling of crime victims. Each utterance of dialogue is annotated with one or more emotion categories from 16 emotion labels including neutral and their corresponding intensity. We further propose strong contextual baselines that can detect the emotion(s) and corresponding emotional intensity of an utterance given the conversational context.

pdf bib
A Sentiment and Emotion Aware Multimodal Multiparty Humor Recognition in Multilingual Conversational Setting
Dushyant Singh Chauhan | Gopendra Vikram Singh | Aseem Arora | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 29th International Conference on Computational Linguistics

In this paper, we hypothesize that humor is closely related to sentiment and emotions. Also, due to the tremendous growth in multilingual content, there is a great demand for building models and systems that support multilingual information access. To end this, we first extend the recently released Multimodal Multiparty Hindi Humor (M2H2) dataset by adding parallel English utterances corresponding to Hindi utterances and then annotating each utterance with sentiment and emotion classes. We name it Sentiment, Humor, and Emotion aware Multilingual Multimodal Multiparty Dataset (SHEMuD). Therefore, we propose a multitask framework wherein the primary task is humor detection, and the auxiliary tasks are sentiment and emotion identification. We design a multitasking framework wherein we first propose a Context Transformer to capture the deep contextual relationships with the input utterances. We then propose a Sentiment and Emotion aware Embedding (SE-Embedding) to get the overall representation of a particular emotion and sentiment w.r.t. the specific humor situation. Experimental results on the SHEMuD show the efficacy of our approach and shows that multitask learning offers an improvement over the single-task framework for both monolingual (4.86 points in Hindi and 5.9 points in English in F1-score) and multilingual (5.17 points in F1-score) setting.

pdf bib
COMMA-DEER: COmmon-sense Aware Multimodal Multitask Approach for Detection of Emotion and Emotional Reasoning in Conversations
Soumitra Ghosh | Gopendra Vikram Singh | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 29th International Conference on Computational Linguistics

Mental health is a critical component of the United Nations’ Sustainable Development Goals (SDGs), particularly Goal 3, which aims to provide “good health and well-being”. The present mental health treatment gap is exacerbated by stigma, lack of human resources, and lack of research capability for implementation and policy reform. We present and discuss a novel task of detecting emotional reasoning (ER) and accompanying emotions in conversations. In particular, we create a first-of-its-kind multimodal mental health conversational corpus that is manually annotated at the utterance level with emotional reasoning and related emotion. We develop a multimodal multitask framework with a novel multimodal feature fusion technique and a contextuality learning module to handle the two tasks. Leveraging multimodal sources of information, commonsense reasoning, and through a multitask framework, our proposed model produces strong results. We achieve performance gains of 6% accuracy and 4.62% F1 on the emotion detection task and 3.56% accuracy and 3.31% F1 on the ER detection task, when compared to the existing state-of-the-art model.