Mauajama Firdaus


2024

pdf bib
Deciphering Cognitive Distortions in Patient-Doctor Mental Health Conversations: A Multimodal LLM-Based Detection and Reasoning Framework
Gopendra Vikram Singh | Sai Vardhan Vemulapalli | Mauajama Firdaus | Asif Ekbal
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Cognitive distortion research holds increasing significance as it sheds light on pervasive errors in thinking patterns, providing crucial insights into mental health challenges and fostering the development of targeted interventions and therapies. This paper delves into the complex domain of cognitive distortions which are prevalent distortions in cognitive processes often associated with mental health issues. Focusing on patient-doctor dialogues, we introduce a pioneering method for detecting and reasoning about cognitive distortions utilizing Large Language Models (LLMs). Operating within a multimodal context encompassing audio, video, and textual data, our approach underscores the critical importance of integrating diverse modalities for a comprehensive understanding of cognitive distortions. By leveraging multimodal information, including audio, video, and textual data, our method offers a nuanced perspective that enhances the accuracy and depth of cognitive distortion detection and reasoning in a zero-shot manner. Our proposed hierarchical framework adeptly tackles both detection and reasoning tasks, showcasing significant performance enhancements compared to current methodologies. Through comprehensive analysis, we elucidate the efficacy of our approach, offering promising insights into the diagnosis and understanding of cognitive distortions in multimodal settings.The code and dataset can be found here: https://github.com/clang1234/ZS-CoDR.git

pdf bib
On the Way to Gentle AI Counselor: Politeness Cause Elicitation and Intensity Tagging in Code-mixed Hinglish Conversations for Social Good
Priyanshu Priya | Gopendra Singh | Mauajama Firdaus | Jyotsna Agrawal | Asif Ekbal
Findings of the Association for Computational Linguistics: NAACL 2024

Politeness is a multifaceted concept influenced by individual perceptions of what is considered polite or impolite. With this objective, we introduce a novel task - Politeness Cause Elicitation and Intensity Tagging (PCEIT). This task focuses on conversations and aims to identify the underlying reasons behind the use of politeness and gauge the degree of politeness conveyed. To address this objective, we create HING-POEM, a new conversational dataset in Hinglish (a blend of Hindi and English) for mental health and legal counseling of crime victims. The rationale for the domain selection lies in the paramount importance of politeness in mental health and legal counseling of crime victims to ensure a compassionate and cordial atmosphere for them. We enrich the HING-POEM dataset by annotating it with politeness labels, politeness causal spans, and intensity values at the level of individual utterances. In the context of the introduced PCEIT task, we present PAANTH (Politeness CAuse ElicitAion and INtensity Tagging in Hinglish), a comprehensive framework based on Contextual Enhanced Attentive Convolution Transformer. We conduct extensive quantitative and qualitative evaluations to establish the effectiveness of our proposed approach using the newly constructed dataset. Our approach is compared against state-of-the-art baselines, and these analyses help demonstrate the superiority of our method.

pdf bib
Knowledge-enhanced Response Generation in Dialogue Systems: Current Advancements and Emerging Horizons
Priyanshu Priya | Deeksha Varshney | Mauajama Firdaus | Asif Ekbal
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): Tutorial Summaries

This tutorial provides an in-depth exploration of Knowledge-enhanced Dialogue Systems (KEDS), diving into their foundational aspects, methodologies, advantages, and practical applications. Topics include the distinction between internal and external knowledge integration, diverse methodologies employed in grounding dialogues, and innovative approaches to leveraging knowledge graphs for enhanced conversation quality. Furthermore, the tutorial touches upon the rise of biomedical text mining, the advent of domain-specific language models, and the challenges and strategies specific to medical dialogue generation. The primary objective is to give attendees a comprehensive understanding of KEDS. By delineating the nuances of these systems, the tutorial aims to elucidate their significance, highlight advancements made using deep learning, and pinpoint the current challenges. Special emphasis is placed on showcasing how KEDS can be fine-tuned for domain-specific requirements, with a spotlight on the healthcare sector. The tutorial is crafted for both beginners and intermediate researchers in the dialogue systems domain, with a focus on those keen on advancing research in KEDS. It will also be valuable for practitioners in sectors like healthcare, seeking to integrate advanced dialogue systems.

2023

pdf bib
Prompt-Based Editing for Text Style Transfer
Guoqing Luo | Yu Han | Lili Mou | Mauajama Firdaus
Findings of the Association for Computational Linguistics: EMNLP 2023

Prompting approaches have been recently explored in text style transfer, where a textual prompt is used to query a pretrained language model (PLM) to generate style-transferred texts word by word in an autoregressive manner. However, such a generation process is less controllable and early prediction errors may affect future word predictions. In this paper, we propose a prompt-based editing approach to text style transfer. Specifically, we prompt a PLM for style classification and use the classification probability to compute a style score. Then, we perform discrete search with word-level editing to maximize a comprehensive scoring function for the style-transfer task. In this way, we transform a prompt-based generation problem into a classification one, which does not suffer from the error accumulation problem and is more controllable than the autoregressive generation of sentences. In our experiments, we performed both automatic and human evaluation on three style-transfer benchmark datasets, and show that our approach largely outperforms the existing systems that have 20 times more parameters. Additional empirical analyses further demonstrate the effectiveness of our approach.

pdf bib
Mixing It Up: Inducing Empathy and Politeness using Multiple Behaviour-aware Generators for Conversational Systems
Mauajama Firdaus | Priyanshu Priya | Asif Ekbal
Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)

pdf bib
A Unified Framework for Emotion Identification and Generation in Dialogues
Avinash Madasu | Mauajama Firdaus | Asif Ekbal
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Social chatbots have gained immense popularity, and their appeal lies not just in their capacity to respond to the diverse requests from users, but also in the ability to develop an emotional connection with users. To further develop and promote social chatbots, we need to concentrate on increasing user interaction and take into account both the intellectual and emotional quotient in the conversational agents. In this paper, we propose a multi-task framework that jointly identifies the emotion of a given dialogue and generates response in accordance to the identified emotion. We employ a {BERT} based network for creating an empathetic system and use a mixed objective function that trains the end-to-end network with both the classification and generation loss. Experimental results show that our proposed framework outperforms current state-of-the-art models.

2022

pdf bib
Are Emoji, Sentiment, and Emotion Friends? A Multi-task Learning for Emoji, Sentiment, and Emotion Analysis
Gopendra Vikram Singh | Dushyant Singh Chauhan | Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf bib
Empathetic Persuasion: Reinforcing Empathy and Persuasiveness in Dialogue Systems
Azlaan Mustafa Samad | Kshitij Mishra | Mauajama Firdaus | Asif Ekbal
Findings of the Association for Computational Linguistics: NAACL 2022

Persuasion is an intricate process involving empathetic connection between two individuals. Plain persuasive responses may make a conversation non-engaging. Even the most well-intended and reasoned persuasive conversations can fall through in the absence of empathetic connection between the speaker and listener. In this paper, we propose a novel task of incorporating empathy when generating persuasive responses. We develop an empathetic persuasive dialogue system by fine-tuning a maximum likelihood Estimation (MLE)-based language model in a reinforcement learning (RL) framework. To design feedbacks for our RL-agent, we define an effective and efficient reward function considering consistency, repetitiveness, emotion and persuasion rewards to ensure consistency, non-repetitiveness, empathy and persuasiveness in the generated responses. Due to lack of emotion annotated persuasive data, we first annotate the existing Persuaion For Good dataset with emotions, then build transformer based classifiers to provide emotion based feedbacks to our RL agent. Experimental results confirm that our proposed model increases the rate of generating persuasive responses as compared to the available state-of-the-art dialogue models while making the dialogues empathetically more engaging and retaining the language quality in responses.

pdf bib
EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in Hindi for Emotion Recognition in Dialogues
Gopendra Vikram Singh | Priyanshu Priya | Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The long-standing goal of Artificial Intelligence (AI) has been to create human-like conversational systems. Such systems should have the ability to develop an emotional connection with the users, consequently, emotion recognition in dialogues has gained popularity. Emotion detection in dialogues is a challenging task because humans usually convey multiple emotions with varying degrees of intensities in a single utterance. Moreover, emotion in an utterance of a dialogue may be dependent on previous utterances making the task more complex. Recently, emotion recognition in low-resource languages like Hindi has been in great demand. However, most of the existing datasets for multi-label emotion and intensity detection in conversations are in English. To this end, we propose a large conversational dataset in Hindi named EmoInHindi for multi-label emotion and intensity recognition in conversations containing 1,814 dialogues with a total of 44,247 utterances. We prepare our dataset in a Wizard-of-Oz manner for mental health and legal counselling of crime victims. Each utterance of dialogue is annotated with one or more emotion categories from 16 emotion labels including neutral and their corresponding intensity. We further propose strong contextual baselines that can detect the emotion(s) and corresponding emotional intensity of an utterance given the conversational context.

pdf bib
PoliSe: Reinforcing Politeness Using User Sentiment for Customer Care Response Generation
Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 29th International Conference on Computational Linguistics

The interaction between a consumer and the customer service representative greatly contributes to the overall customer experience. Therefore, to ensure customers’ comfort and retention, it is important that customer service agents and chatbots connect with users on social, cordial, and empathetic planes. In the current work, we automatically identify the sentiment of the user and transform the neutral responses into polite responses conforming to the sentiment and the conversational history. Our technique is basically a reinforced multi-task network- the primary task being ‘polite response generation’ and the secondary task being ‘sentiment analysis’- that uses a Transformer based encoder-decoder. We use sentiment annotated conversations from Twitter as the training data. The detailed evaluation shows that our proposed approach attains superior performance compared to the baseline models.

2021

pdf bib
SEPRG: Sentiment aware Emotion controlled Personalized Response Generation
Mauajama Firdaus | Umang Jain | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 14th International Conference on Natural Language Generation

Social chatbots have gained immense popularity, and their appeal lies not just in their capacity to respond to the diverse requests from users, but also in the ability to develop an emotional connection with users. To further develop and promote social chatbots, we need to concentrate on increasing user interaction and take into account both the intellectual and emotional quotient in the conversational agents. Therefore, in this work, we propose the task of sentiment aware emotion controlled personalized dialogue generation giving the machine the capability to respond emotionally and in accordance with the persona of the user. As sentiment and emotions are highly co-related, we use the sentiment knowledge of the previous utterance to generate the correct emotional response in accordance with the user persona. We design a Transformer based Dialogue Generation framework, that generates responses that are sensitive to the emotion of the user and corresponds to the persona and sentiment as well. Moreover, the persona information is encoded by a different Transformer encoder, along with the dialogue history, is fed to the decoder for generating responses. We annotate the PersonaChat dataset with sentiment information to improve the response quality. Experimental results on the PersonaChat dataset show that the proposed framework significantly outperforms the existing baselines, thereby generating personalized emotional responses in accordance with the sentiment that provides better emotional connection and user satisfaction as desired in a social chatbot.

2020

pdf bib
MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations
Mauajama Firdaus | Hardik Chauhan | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 28th International Conference on Computational Linguistics

Emotion and sentiment classification in dialogues is a challenging task that has gained popularity in recent times. Humans tend to have multiple emotions with varying intensities while expressing their thoughts and feelings. Emotions in an utterance of dialogue can either be independent or dependent on the previous utterances, thus making the task complex and interesting. Multi-label emotion detection in conversations is a significant task that provides the ability to the system to understand the various emotions of the users interacting. Sentiment analysis in dialogue/conversation, on the other hand, helps in understanding the perspective of the user with respect to the ongoing conversation. Along with text, additional information in the form of audio and video assist in identifying the correct emotions with the appropriate intensity and sentiments in an utterance of a dialogue. Lately, quite a few datasets have been made available for dialogue emotion and sentiment classification, but these datasets are imbalanced in representing different emotions and consist of an only single emotion. Hence, we present at first a large-scale balanced Multimodal Multi-label Emotion, Intensity, and Sentiment Dialogue dataset (MEISD), collected from different TV series that has textual, audio and visual features, and then establish a baseline setup for further research.

pdf bib
Incorporating Politeness across Languages in Customer Care Responses: Towards building a Multi-lingual Empathetic Dialogue Agent
Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Twelfth Language Resources and Evaluation Conference

Customer satisfaction is an essential aspect of customer care systems. It is imperative for such systems to be polite while handling customer requests/demands. In this paper, we present a large multi-lingual conversational dataset for English and Hindi. We choose data from Twitter having both generic and courteous responses between customer care agents and aggrieved users. We also propose strong baselines that can induce courteous behaviour in generic customer care response in a multi-lingual scenario. We build a deep learning framework that can simultaneously handle different languages and incorporate polite behaviour in the customer care agent’s responses. Our system is competent in generating responses in different languages (here, English and Hindi) depending on the customer’s preference and also is able to converse with humans in an empathetic manner to ensure customer satisfaction and retention. Experimental results show that our proposed models can converse in both the languages and the information shared between the languages helps in improving the performance of the overall system. Qualitative and quantitative analysis shows that the proposed method can converse in an empathetic manner by incorporating courteousness in the responses and hence increasing customer satisfaction.

pdf bib
MultiDM-GCN: Aspect-guided Response Generation in Multi-domain Multi-modal Dialogue System using Graph Convolutional Network
Mauajama Firdaus | Nidhi Thakur | Asif Ekbal
Findings of the Association for Computational Linguistics: EMNLP 2020

In the recent past, dialogue systems have gained immense popularity and have become ubiquitous. During conversations, humans not only rely on languages but seek contextual information through visual contents as well. In every task-oriented dialogue system, the user is guided by the different aspects of a product or service that regulates the conversation towards selecting the product or service. In this work, we present a multi-modal conversational framework for a task-oriented dialogue setup that generates the responses following the different aspects of a product or service to cater to the user’s needs. We show that the responses guided by the aspect information provide more interactive and informative responses for better communication between the agent and the user. We first create a Multi-domain Multi-modal Dialogue (MDMMD) dataset having conversations involving both text and images belonging to the three different domains, such as restaurants, electronics, and furniture. We implement a Graph Convolutional Network (GCN) based framework that generates appropriate textual responses from the multi-modal inputs. The multi-modal information having both textual and image representation is fed to the decoder and the aspect information for generating aspect guided responses. Quantitative and qualitative analyses show that the proposed methodology outperforms several baselines for the proposed task of aspect-guided response generation.

2019

pdf bib
Ordinal and Attribute Aware Response Generation in a Multimodal Dialogue System
Hardik Chauhan | Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Multimodal dialogue systems have opened new frontiers in the traditional goal-oriented dialogue systems. The state-of-the-art dialogue systems are primarily based on unimodal sources, predominantly the text, and hence cannot capture the information present in the other sources such as videos, audios, images etc. With the availability of large scale multimodal dialogue dataset (MMD) (Saha et al., 2018) on the fashion domain, the visual appearance of the products is essential for understanding the intention of the user. Without capturing the information from both the text and image, the system will be incapable of generating correct and desirable responses. In this paper, we propose a novel position and attribute aware attention mechanism to learn enhanced image representation conditioned on the user utterance. Our evaluation shows that the proposed model can generate appropriate responses while preserving the position and attribute information. Experimental results also prove that our proposed approach attains superior performance compared to the baseline models, and outperforms the state-of-the-art approaches on text similarity based evaluation metrics.

pdf bib
Courteously Yours: Inducing courteous behavior in Customer Care responses using Reinforced Pointer Generator Network
Hitesh Golchha | Mauajama Firdaus | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper, we propose an effective deep learning framework for inducing courteous behavior in customer care responses. The interaction between a customer and the customer care representative contributes substantially to the overall customer experience. Thus it is imperative for customer care agents and chatbots engaging with humans to be personal, cordial and emphatic to ensure customer satisfaction and retention. Our system aims at automatically transforming neutral customer care responses into courteous replies. Along with stylistic transfer (of courtesy), our system ensures that responses are coherent with the conversation history, and generates courteous expressions consistent with the emotional state of the customer. Our technique is based on a reinforced pointer-generator model for the sequence to sequence task. The model is also conditioned on a hierarchically encoded and emotionally aware conversational context. We use real interactions on Twitter between customer care professionals and aggrieved customers to create a large conversational dataset having both forms of agent responses: ‘generic’ and ‘courteous’. We perform quantitative and qualitative analyses on established and task-specific metrics, both automatic and human evaluation based. Our evaluation shows that the proposed models can generate emotionally-appropriate courteous expressions while preserving the content. Experimental results also prove that our proposed approach performs better than the baseline models.