Sriparna Saha

2025

pdf bib abs
Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models
Sofia Jamil | Bollampalli Areen Reddy | Raghvendra Kumar | Sriparna Saha | Joseph K. J | Koustava Goswami
Proceedings of the 31st International Conference on Computational Linguistics

The task of text-to-image generation has encountered significant challenges when applied to literary works, especially poetry. Poems are a distinct form of literature, with meanings that frequently transcend beyond the literal words. To address this shortcoming, we propose a PoemToPixel framework designed to generate images that visually represent the inherent meanings of poems. Our approach incorporates the concept of prompt tuning in our image generation framework to ensure that the resulting images closely align with the poetic content. In addition, we propose the PoeKey algorithm, which extracts three key elements in the form of emotions, visual elements, and themes from poems to form instructions which are subsequently provided to a diffusion model for generating corresponding images. Furthermore, to expand the diversity of the poetry dataset across different genres and ages, we introduce MiniPo, a novel multimodal dataset comprising 1001 children’s poems and images. Leveraging this dataset alongside PoemSum, we conducted both quantitative and qualitative evaluations of image generation using our PoemToPixel framework. This paper demonstrates the effectiveness of our approach and offers a fresh perspective on generating images from literary sources. The code and dataset used in this work are publicly available.

2024

pdf bib abs
MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
Prince Jha | Raghav Jain | Konika Mandal | Aman Chadha | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In the digital world, memes present a unique challenge for content moderation due to their potential to spread harmful content. Although detection methods have improved, proactive solutions such as intervention are still limited, with current research focusing mostly on text-based content, neglecting the widespread influence of multimodal content like memes. Addressing this gap, we present MemeGuard, a comprehensive framework leveraging Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention. MemeGuard harnesses a specially fine-tuned VLM, VLMeme, for meme interpretation, and a multimodal knowledge selection and ranking mechanism (MKS) for distilling relevant knowledge. This knowledge is then employed by a general-purpose LLM to generate contextually appropriate interventions. Another key contribution of this work is the Intervening Cyberbullying in Multimodal Memes (ICMM) dataset, a high-quality, labeled dataset featuring toxic memes and their corresponding human-annotated interventions. We leverage ICMM to test MemeGuard, demonstrating its proficiency in generating relevant and effective responses to toxic memes. red Disclaimer: This paper contains harmful content that may be disturbing to some readers.

pdf bib abs
From Sights to Insights: Towards Summarization of Multimodal Clinical Documents
Akash Ghosh | Mohit Tomar | Abhisek Tiwari | Sriparna Saha | Jatin Salve | Setu Sinha
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The advancement of Artificial Intelligence is pivotal in reshaping healthcare, enhancing diagnostic precision, and facilitating personalized treatment strategies. One major challenge for healthcare professionals is quickly navigating through long clinical documents to provide timely and effective solutions. Doctors often struggle to draw quick conclusions from these extensive documents. To address this issue and save time for healthcare professionals, an effective summarization model is essential. Most current models assume the data is only text-based. However, patients often include images of their medical conditions in clinical documents. To effectively summarize these multimodal documents, we introduce EDI-Summ, an innovative Image-Guided Encoder-Decoder Model. This model uses modality-aware contextual attention on the encoder and an image cross-attention mechanism on the decoder, enhancing the BART base model to create detailed visual-guided summaries. We have tested our model extensively on three multimodal clinical benchmarks involving multimodal question and dialogue summarization tasks. Our analysis demonstrates that EDI-Summ outperforms state-of-the-art large language and vision-aware models in these summarization tasks. Disclaimer: The work includes vivid medical illustrations, depicting the essential aspects of the subject matter.

pdf bib abs
Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations
Prince Jha | Krishanu Maity | Raghav Jain | Apoorv Verma | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Internet memes have gained significant influence in communicating political, psychological, and sociocultural ideas. While meme are often humorous, there has been a rise in the use of memes for trolling and cyberbullying. Although a wide variety of effective deep learning-based models have been developed for detecting offensive multimodal memes, only a few works have been done on explainability aspect. Recent laws like “right to explanations” of General Data Protection Regulation, have spurred research in developing interpretable models rather than only focusing on performance. Motivated by this, we introduce MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes. Here, both visual and textual modalities are highlighted to explain why a given meme is cyberbullying. A Contrastive Language-Image Pretraining (CLIP) projection based multimodal shared-private multitask approach has been proposed for visual and textual explanation of a meme. Experimental results demonstrate that training with multimodal explanations improves performance in generating textual justifications and more accurately identifying the visual evidence supporting a decision with reliable performance improvements.

pdf bib abs
ToxVidLM: A Multimodal Framework for Toxicity Detection in Code-Mixed Videos
Krishanu Maity | A.S. Poornash | Sriparna Saha | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: ACL 2024

In an era of rapidly evolving internet technology, the surge in multimodal content, including videos, has expanded the horizons of online communication. However, the detection of toxic content in this diverse landscape, particularly in low-resource code-mixed languages, remains a critical challenge. While substantial research has addressed toxic content detection in textual data, the realm of video content, especially in non-English languages, has been relatively underexplored. This paper addresses this research gap by introducing a benchmark dataset, the first of its kind, consisting of 931 videos with 4021 code-mixed Hindi-English utterances collected from YouTube. Each utterance within this dataset has been meticulously annotated for toxicity, severity, and sentiment labels. We have developed an advanced Multimodal Multitask framework built for Toxicity detection in Video Content by leveraging Language Models (LMs), crafted for the primary objective along with the additional tasks of conducting sentiment and severity analysis. ToxVidLM incorporates three key modules – the Encoder module, Cross-Modal Synchronization module, and Multitask module – crafting a generic multimodal LM customized for intricate video classification tasks. Our experiments reveal that incorporating multiple modalities from the videos substantially enhances the performance of toxic content detection by achieving an Accuracy and Weighted F1 score of 94.29% and 94.35%, respectively.

pdf bib abs
Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development
Pranab Sahoo | Ayush Singh | Sriparna Saha | Aman Chadha | Samrat Mondal
Findings of the Association for Computational Linguistics: ACL 2024

The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety by identifying potential risks associated with medications, facilitating early detection of adverse events, and guiding regulatory decision-making. Traditional ADE detection methods are reliable but slow, not easily adaptable to large-scale operations, and offer limited information. With the exponential increase in data sources like social media content, biomedical literature, and Electronic Medical Records (EMR), extracting relevant ADE-related information from these unstructured texts is imperative. Previous ADE mining studies have focused on text-based methodologies, overlooking visual cues, limiting contextual comprehension, and hindering accurate interpretation. To address this gap, we present a MultiModal Adverse Drug Event (MMADE) detection dataset, merging ADE-related textual information with visual aids. Additionally, we introduce a framework that leverages the capabilities of LLMs and VLMs for ADE detection by generating detailed descriptions of medical images depicting ADEs, aiding healthcare professionals in visually identifying adverse events. Using our MMADE dataset, we showcase the significance of integrating visual cues from images to enhance overall performance. This approach holds promise for patient safety, ADE awareness, and healthcare accessibility, paving the way for further exploration in personalized healthcare.

pdf bib abs
HealthAlignSumm : Utilizing Alignment for Multimodal Summarization of Code-Mixed Healthcare Dialogues
Akash Ghosh | Arkadeep Acharya | Sriparna Saha | Gaurav Pandey | Dinesh Raghu | Setu Sinha
Findings of the Association for Computational Linguistics: EMNLP 2024

As generative AI progresses, collaboration be-tween doctors and AI scientists is leading to thedevelopment of personalized models to stream-line healthcare tasks and improve productivity.Summarizing doctor-patient dialogues has be-come important, helping doctors understandconversations faster and improving patient care.While previous research has mostly focused ontext data, incorporating visual cues from pa-tient interactions allows doctors to gain deeperinsights into medical conditions. Most of thisresearch has centered on English datasets, butreal-world conversations often mix languagesfor better communication. To address the lackof resources for multimodal summarization ofcode-mixed dialogues in healthcare, we devel-oped the MCDH dataset. Additionally, we cre-ated HealthAlignSumm, a new model that in-tegrates visual components with the BART ar-chitecture. This represents a key advancementin multimodal fusion, applied within both theencoder and decoder of the BART model. Ourwork is the first to use alignment techniques,including state-of-the-art algorithms like DirectPreference Optimization, on encoder-decodermodels with synthetic datasets for multimodalsummarization. Through extensive experi-ments, we demonstrated the superior perfor-mance of HealthAlignSumm across severalmetrics validated by both automated assess-ments and human evaluations. The datasetMCDH and our proposed model HealthAlign-Summ will be available in this GitHub accounthttps://github.com/AkashGhosh/HealthAlignSumm-Utilizing-Alignment-for-Multimodal-Summarization-of-Code-Mixed-Healthcare-DialoguesDisclaimer: This work involves medical im-agery based on the subject matter of the topic.

pdf bib abs
A Comprehensive Survey of Hallucination in Large Language, Image, Video and Audio Foundation Models
Pranab Sahoo | Prabhash Meharia | Akash Ghosh | Sriparna Saha | Vinija Jain | Aman Chadha
Findings of the Association for Computational Linguistics: EMNLP 2024

The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the proliferation of FMs brings forth a critical challenge: the potential to generate hallucinated outputs, particularly in high-stakes applications. The tendency of foundation models to produce hallucinated content arguably represents the biggest hindrance to their widespread adoption in real-world scenarios, especially in domains where reliability and accuracy are paramount. This survey paper presents a comprehensive overview of recent developments that aim to identify and mitigate the problem of hallucination in FMs, spanning text, image, video, and audio modalities. By synthesizing recent advancements in detecting and mitigating hallucination across various modalities, the paper aims to provide valuable insights for researchers, developers, and practitioners. Essentially, it establishes a clear framework encompassing definition, taxonomy, and detection strategies for addressing hallucination in multimodal foundation models, laying the foundation for future research and development in this pivotal area.

pdf bib
Wealth Guide: A Sophisticated Language Model Solution for Financial Trading Decisions
Sarmistha Das | R E Zera Marveen Lyngkhoi | Sriparna Saha | Alka Maurya
Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning

pdf bib abs
Natural Answer Generation: From Factoid Answer to Full-length Answer using Grammar Correction
Manas Jain | Sriparna Saha | Pushpak Bhattacharyya | Gladvin Chinnadurai | Manish Vatsa
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

Question Answering systems these days typically use template-based language generation. Though adequate for a domain-specific task, these systems are too restrictive and predefined for domain-independent systems. This paper proposes a system that outputs a full-length answer given a question and the extracted factoid answer (short spans such as named entities) as the input. Our system uses constituency and dependency parse trees of questions. A transformer-based Grammar Error Correction model GECToR is used as a post-processing step for better fluency. We compare our system with (i) a Modified Pointer Generator (SOTA) and (ii) Fine-tuned DialoGPT for factoid questions. We also tested our approach on existential (yes-no) questions with better results. Our model generates more accurate and fluent answers than the state-of-the-art (SOTA) approaches. The evaluation is done on NewsQA and SqUAD datasets with an increment of 0.4 and 0.9 percentage points in ROUGE-1 score respectively. Also, the inference time is reduced by 85% compared to the SOTA. The improved datasets used for our evaluation will be released as part of the research contribution.

pdf bib abs
Action and Reaction Go Hand in Hand! a Multi-modal Dialogue Act Aided Sarcasm Identification
Mohit Singh Tomar | Tulika Saha | Abhisek Tiwari | Sriparna Saha
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Sarcasm primarily involves saying something but “meaning the opposite” or “meaning something completely different” in order to convey a particular tone or mood. In both the above cases, the “meaning” is reflected by the communicative intention of the speaker, known as dialogue acts. In this paper, we seek to investigate a novel phenomenon of analyzing sarcasm in the context of dialogue acts with the hypothesis that the latter helps to understand the former better. Toward this aim, we extend the multi-modal MUStARD dataset to enclose dialogue acts for each dialogue. To demonstrate the utility of our hypothesis, we develop a dialogue act-aided multi-modal transformer network for sarcasm identification (MM-SARDAC), leveraging interrelation between these tasks. In addition, we introduce an order-infused, multi-modal infusion mechanism into our proposed model, which allows for a more intuitive combined modality representation by selectively focusing on relevant modalities in an ordered manner. Extensive empirical results indicate that dialogue act-aided sarcasm identification achieved better performance compared to performing sarcasm identification alone. The dataset and code are available at https://github.com/mohit2b/MM-SARDAC.

pdf bib abs
Seeing Is Believing! towards Knowledge-Infused Multi-modal Medical Dialogue Generation
Abhisek Tiwari | Shreyangshu Bera | Preeti Verma | Jaithra Varma Manthena | Sriparna Saha | Pushpak Bhattacharyya | Minakshi Dhar | Sarbajeet Tiwari
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Over the last few years, artificial intelligence-based clinical assistance has gained immense popularity and demand in telemedicine, including automatic disease diagnosis. Patients often describe their signs and symptoms to doctors using visual aids, which provide vital evidence for identifying a medical condition. In addition to learning from our experiences, we learn from well-established theories/ knowledge. With the motivation of leveraging visual cues and medical knowledge, we propose a transformer-based, knowledge-infused multi-modal medical dialogue generation (KI-MMDG) framework. In addition, we present a discourse-aware image identifier (DII) that recognizes signs and their severity by leveraging the current conversation context in addition to the image of the signs. We first curate an empathy and severity-aware multi-modal medical dialogue (ES-MMD) corpus in English, which is annotated with intent, symptoms, and visual signs with severity information. Experimental results show the superior performance of the proposed KI-MMDG model over uni-modal and non-knowledge infused generative models, demonstrating the importance of visual signs and knowledge infusion in symptom investigation and diagnosis. We also observed that the DII model surpasses the existing state-of-the-art model by 7.84%, indicating the crucial significance of dialogue context for identifying a sign image surfaced during conversations. The code and dataset are available at https://github.com/NLP-RL/KI-MMDG.

2023

pdf bib abs
Peeking inside the black box: A Commonsense-aware Generative Framework for Explainable Complaint Detection
Apoorva Singh | Raghav Jain | Prince Jha | Sriparna Saha
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Complaining is an illocutionary act in which the speaker communicates his/her dissatisfaction with a set of circumstances and holds the hearer (the complainee) answerable, directly or indirectly. Considering breakthroughs in machine learning approaches, the complaint detection task has piqued the interest of the natural language processing (NLP) community. Most of the earlier studies failed to justify their findings, necessitating the adoption of interpretable models that can explain the model’s output in real time. We introduce an explainable complaint dataset, X-CI, the first benchmark dataset for explainable complaint detection. Each instance in the X-CI dataset is annotated with five labels: complaint label, emotion label, polarity label, complaint severity level, and rationale (explainability), i.e., the causal span explaining the reason for the complaint/non-complaint label. We address the task of explainable complaint detection and propose a commonsense-aware unified generative framework by reframing the multitask problem as a text-to-text generation task. Our framework can predict the complaint cause, severity level, emotion, and polarity of the text in addition to detecting whether it is a complaint or not. We further establish the advantages of our proposed model on various evaluation metrics over the state-of-the-art models and other baselines when applied to the X-CI dataset in both full and few-shot settings.

pdf bib abs
APTSumm at BioLaySumm Task 1: Biomedical Breakdown, Improving Readability by Relevancy Based Selection
A.s. Poornash | Atharva Deshmukh | Archit Sharma | Sriparna Saha
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

In this paper we tackle a lay summarization task which aims to produce lay-summary of biomedical articles. BioLaySumm in the BioNLP Workshop at ACL 2023 (Goldsack et al., 2023), has presented us with this lay summarization task for biomedical articles. Our proposed models provide a three-step abstractive approach for summarizing biomedical articles. Our methodology involves breaking down the original document into distinct sections, generating candidate summaries for each subsection, then finally re-ranking and selecting the top performing paragraph for each section. We run ablation studies to establish that each step in our pipeline is critical for improvement in the quality of lay summary. This model achieved the second-highest rank in terms of readability scores (Luo et al., 2022). Our work distinguishes itself from previous studies by not only considering the content of the paper but also its structure, resulting in more coherent and comprehensible lay summaries. We hope that our model for generating lay summaries of biomedical articles will be a useful resource for individuals across various domains, including academia, industry, and healthcare, who require rapid comprehension of key scientific research.

pdf bib abs
Large Scale Multi-Lingual Multi-Modal Summarization Dataset
Yash Verma | Anubhav Jangra | Raghvendra Verma | Sriparna Saha
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Significant developments in techniques such as encoder-decoder models have enabled us to represent information comprising multiple modalities. This information can further enhance many downstream tasks in the field of information retrieval and natural language processing; however, improvements in multi-modal techniques and their performance evaluation require large-scale multi-modal data which offers sufficient diversity. Multi-lingual modeling for a variety of tasks like multi-modal summarization, text generation, and translation leverages information derived from high-quality multi-lingual annotated data. In this work, we present the current largest multi-lingual multi-modal summarization dataset (M3LS), and it consists of over a million instances of document-image pairs along with a professionally annotated multi-modal summary for each pair. It is derived from news articles published by British Broadcasting Corporation(BBC) over a decade and spans 20 languages, targeting diversity across five language roots, it is also the largest summarization dataset for 13 languages and consists of cross-lingual summarization data for 2 languages. We formally define the multi-lingual multi-modal summarization task utilizing our dataset and report baseline scores from various state-of-the-art summarization techniques in a multi-lingual setting. We also compare it with many similar datasets to analyze the uniqueness and difficulty of M3LS. The dataset and code used in this work are made available at “https://github.com/anubhav-jangra/M3LS”.

pdf bib abs
Do Language Models Have a Common Sense regarding Time? Revisiting Temporal Commonsense Reasoning in the Era of Large Language Models
Raghav Jain | Daivik Sojitra | Arkadeep Acharya | Sriparna Saha | Adam Jatowt | Sandipan Dandapat
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Temporal reasoning represents a vital component of human communication and understanding, yet remains an underexplored area within the context of Large Language Models (LLMs). Despite LLMs demonstrating significant proficiency in a range of tasks, a comprehensive, large-scale analysis of their temporal reasoning capabilities is missing. Our paper addresses this gap, presenting the first extensive benchmarking of LLMs on temporal reasoning tasks. We critically evaluate 8 different LLMs across 6 datasets using 3 distinct prompting strategies. Additionally, we broaden the scope of our evaluation by including in our analysis 2 Code Generation LMs. Beyond broad benchmarking of models and prompts, we also conduct a fine-grained investigation of performance across different categories of temporal tasks. We further analyze the LLMs on varying temporal aspects, offering insights into their proficiency in understanding and predicting the continuity, sequence, and progression of events over time. Our findings reveal a nuanced depiction of the capabilities and limitations of the models within temporal reasoning, offering a comprehensive reference for future research in this pivotal domain.

pdf bib abs
Federated Meta-Learning for Emotion and Sentiment Aware Multi-modal Complaint Identification
Apoorva Singh | Siddarth Chandrasekar | Sriparna Saha | Tanmay Sen
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Automatic detection of consumers’ complaints about items or services they buy can be critical for organizations and online merchants. Previous studies on complaint identification are limited to text. Images along with the reviews can provide cues to identify complaints better, thus emphasizing the importance of incorporating multi-modal inputs into the process. Generally, the customer’s emotional state significantly impacts the complaint expression; thus, the effect of emotion and sentiment on complaint identification must also be investigated. Furthermore, different organizations are usually not allowed to share their privacy-sensitive records due to data security and privacy concerns. Due to these issues, traditional models find it hard to understand and identify complaint patterns, particularly in the financial and healthcare sectors. In this work, we created a new dataset - Multi-modal Complaint Dataset (MCD), a collection of reviews and images of the products posted on the website of the retail giant Amazon. We propose a federated meta-learning-based multi-modal multi-task framework for identifying complaints considering emotion recognition and sentiment analysis as two auxiliary tasks. Experimental results indicate that the proposed approach outperforms the baselines and the state-of-the-art approaches in centralized and federated meta-learning settings.

pdf bib abs
GenEx: A Commonsense-aware Unified Generative Framework for Explainable Cyberbullying Detection
Krishanu Maity | Raghav Jain | Prince Jha | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

With the rise of social media and online communication, the issue of cyberbullying has gained significant prominence. While extensive research is being conducted to develop more effective models for detecting cyberbullying in monolingual languages, a significant gap exists in understanding code-mixed languages and the need for explainability in this context. To address this gap, we have introduced a novel benchmark dataset named BullyExplain for explainable cyberbullying detection in code-mixed language. In this dataset, each post is meticulously annotated with four labels: bully, sentiment, target, and rationales, indicating the specific phrases responsible for identifying the post as a bully. Our current research presents an innovative unified generative framework, GenEx, which reimagines the multitask problem as a text-to-text generation task. Our proposed approach demonstrates its superiority across various evaluation metrics when applied to the BullyExplain dataset, surpassing other baseline models and current state-of-the-art approaches.

pdf bib abs
Can you Summarize my learnings? Towards Perspective-based Educational Dialogue Summarization
Raghav Jain | Tulika Saha | Jhagrut Lalwani | Sriparna Saha
Findings of the Association for Computational Linguistics: EMNLP 2023

The steady increase in the utilization of Virtual Tutors (VT) over recent years has allowed for a more efficient, personalized, and interactive AI-based learning experiences. A vital aspect in these educational chatbots is summarizing the conversations between the VT and the students, as it is critical in consolidating learning points and monitoring progress. However, the approach to summarization should be tailored according to the perspective. Summarization from the VTs perspective should emphasize on its teaching efficiency and potential improvements. Conversely, student-oriented summaries should distill learning points, track progress, and suggest scope for improvements. Based on this hypothesis, in this work, we propose a new task of Multi-modal Perspective based Dialogue Summarization (MM-PerSumm), demonstrated in an educational setting. Towards this aim, we introduce a novel dataset, CIMA-Summ that summarizes educational dialogues from three unique perspectives: the Student, the Tutor, and a Generic viewpoint. In addition, we propose an Image and Perspective-guided Dialogue Summarization (IP-Summ) model which is a Seq2Seq language model incorporating (i) multi-modal learning from images and (ii) a perspective-based encoder that constructs a dialogue graph capturing the intentions and actions of both the VT and the student, enabling the summarization of a dialogue from diverse perspectives. Lastly, we conduct detailed analyses of our model’s performance, highlighting the aspects that could lead to optimal modeling of IP-Summ.

pdf bib
Reimagining Complaint Analysis: Adopting Seq2Path for a Generative Text-to-Text Framework
Apoorva Singh | Raghav Jain | Sriparna Saha
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Sentiment Aided Graph Attentive Contextualization for Task Oriented Negotiation Dialogue Generation
Aritra Raut | Sriparna Saha | Anutosh Maitra | Roshni Ramnani
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations
Sriparna Saha | Herry Sujaini
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations

2022

pdf bib abs
Meta-Learning based Deferred Optimisation for Sentiment and Emotion aware Multi-modal Dialogue Act Classification
Tulika Saha | Aditya Prakash Patra | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Dialogue Act Classification (DAC) that determines the communicative intention of an utterance has been investigated widely over the years as a standalone task. But the emotional state of the speaker has a considerable effect on its pragmatic content. Sentiment as a human behavior is also closely related to emotion and one aids in the better understanding of the other. Thus, their role in identification of DAs needs to be explored. As a first step, we extend the newly released multi-modal EMOTyDA dataset to enclose sentiment tags for each utterance. In order to incorporate these multiple aspects, we propose a Dual Attention Mechanism (DAM) based multi-modal, multi-tasking conversational framework. The DAM module encompasses intra-modal and interactive inter-modal attentions with multiple loss optimization at various hierarchies to fuse multiple modalities efficiently and learn generalized features across all the tasks. Additionally, to counter the class-imbalance issue in dialogues, we introduce a 2-step Deferred Optimisation Schedule (DOS) that involves Meta-Net (MN) learning and deferred re-weighting where the former helps to learn an explicit weighting function from data automatically and the latter deploys a re-weighted multi-task loss with a smaller learning rate. Empirically, we establish that the joint optimisation of multi-modal DAC, SA and ER tasks along with the incorporation of 2-step DOS and MN learning produces better results compared to its different counterparts and outperforms state-of-the-art model.

pdf bib abs
Persona or Context? Towards Building Context adaptive Personalized Persuasive Virtual Sales Assistant
Abhisek Tiwari | Sriparna Saha | Shubhashis Sengupta | Anutosh Maitra | Roshni Ramnani | Pushpak Bhattacharyya
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Task-oriented conversational agents are gaining immense popularity and success in a wide range of tasks, from flight ticket booking to online shopping. However, the existing systems presume that end-users will always have a pre-determined and servable task goal, which results in dialogue failure in hostile scenarios, such as goal unavailability. On the other hand, human agents accomplish users’ tasks even in a large number of goal unavailability scenarios by persuading them towards a very similar and servable goal. Motivated by the limitation, we propose and build a novel end-to-end multi-modal persuasive dialogue system incorporated with a personalized persuasive module aided goal controller and goal persuader. The goal controller recognizes goal conflicting/unavailability scenarios and formulates a new goal, while the goal persuader persuades users using a personalized persuasive strategy identified through dialogue context. We also present a novel automatic evaluation metric called Persuasiveness Measurement Rate (PMeR) for quantifying the persuasive capability of a conversational agent. The obtained improvements (both quantitative and qualitative) firmly establish the superiority and need of the proposed context-guided, personalized persuasive virtual agent over existing traditional task-oriented virtual agents. Furthermore, we also curated a multi-modal persuasive conversational dialogue corpus annotated with intent, slot, sentiment, and dialogue act for e-commerce domain.

pdf bib abs
Topic-aware Multimodal Summarization
Sourajit Mukherjee | Anubhav Jangra | Sriparna Saha | Adam Jatowt
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022

Multimodal Summarization (MS) has attracted research interest in the past few years due to the ease with which users perceive multimodal summaries. It is important for MS models to consider the topic a given target content belongs to. In the current paper, we propose a topic-aware MS system which performs two tasks simultaneously: differentiating the images into “on-topic” and “off-topic” categories and further utilizing the “on-topic” images to generate multimodal summaries. The hypothesis is that, the proposed topic similarity classifier will help in generating better multimodal summary by focusing on important components of images and text which are specific to a particular topic. To develop the topic similarity classifier, we have augmented the existing popular MS data set, MSMO, with similar “on-topic” and dissimilar “off-topic” images for each sample. Our experimental results establish that the focus on “on-topic” features helps in generating topic-aware multimodal summaries, which outperforms the state of the art approach by 1.7 % in ROUGE-L metric.

Verb Phrase Anaphora (VPA) is a universal language phenomenon. It can occur in the form of do so phrase, verb phrase ellipsis, etc. Resolving VPA can improve the performance of Dialogue processing systems, Natural Language Generation (NLG), Question Answering (QA) and so on. In this paper, we present a novel computational approach to resolve the specific verb phrase anaphora appearing as do so construct and its lexical variations for the English language. The approach follows a heuristic technique using a combination of parsing from classical NLP, state-of-the-art (SOTA) Generative Pre-trained Transformer (GPT) language model and RoBERTa grammar correction model. The result indicates that our approach can resolve these specific verb phrase anaphora cases with 73.40 F1 score. The data set used for testing the specific verb phrase anaphora cases of do so and doing so is released for research purposes. This module has been used as the last module in a coreference resolution pipeline for a downstream QA task for the electronic home appliances sector.

pdf bib abs
MAKED: Multi-lingual Automatic Keyword Extraction Dataset
Yash Verma | Anubhav Jangra | Sriparna Saha | Adam Jatowt | Dwaipayan Roy
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Keyword extraction is an integral task for many downstream problems like clustering, recommendation, search and classification. Development and evaluation of keyword extraction techniques require an exhaustive dataset; however, currently, the community lacks large-scale multi-lingual datasets. In this paper, we present MAKED, a large-scale multi-lingual keyword extraction dataset comprising of 540K+ news articles from British Broadcasting Corporation News (BBC News) spanning 20 languages. It is the first keyword extraction dataset for 11 of these 20 languages. The quality of the dataset is examined by experimentation with several baselines. We believe that the proposed dataset will help advance the field of automatic keyword extraction given its size, diversity in terms of languages used, topics covered and time periods as well as its focus on under-studied languages.

pdf bib abs
CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts
Muskan Garg | Chandni Saxena | Sriparna Saha | Veena Krishnan | Ruchi Joshi | Vijay Mago
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The social NLP researchers and mental health practitioners have witnessed exponential growth in the field of mental health detection and analysis on social media. It has become important to identify the reason behind mental illness. In this context, we introduce a new dataset for Causal Analysis of Mental health in Social media posts (CAMS). We first introduce the annotation schema for this task of causal analysis. The causal analysis comprises of two types of annotations, viz, causal interpretation and causal categorization. We show the efficacy of our scheme in two ways: (i) crawling and annotating 3155 Reddit data and (ii) re-annotate the publicly available SDCNL dataset of 1896 instances for interpretable causal analysis. We further combine them as CAMS dataset and make it available along with the other source codes https://anonymous.4open.science/r/CAMS1/. Our experimental results show that the hybrid CNN-LSTM model gives the best performance over CAMS dataset.

pdf bib abs
Many Hands Make Light Work: Using Essay Traits to Automatically Score Essays
Rahul Kumar | Sandeep Mathias | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Most research in the area of automatic essay grading (AEG) is geared towards scoring the essay holistically while there has also been little work done on scoring individual essay traits. In this paper, we describe a way to score essays using a multi-task learning (MTL) approach, where scoring the essay holistically is the primary task, and scoring the essay traits is the auxiliary task. We compare our results with a single-task learning (STL) approach, using both LSTMs and BiLSTMs. To find out which traits work best for different types of essays, we conduct ablation tests for each of the essay traits. We also report the runtime and number of training parameters for each system. We find that MTL-based BiLSTM system gives the best results for scoring the essay holistically, as well as performing well on scoring the essay traits. The MTL systems also give a speed-up of between 2.30 to 3.70 times the speed of the STL system, when it comes to scoring the essay and all the traits.

pdf bib abs
A Shoulder to Cry on: Towards A Motivational Virtual Assistant for Assuaging Mental Agony
Tulika Saha | Saichethan Reddy | Anindya Das | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Mental Health Disorders continue plaguing humans worldwide. Aggravating this situation is the severe shortage of qualified and competent mental health professionals (MHPs), which underlines the need for developing Virtual Assistants (VAs) that can assist MHPs. The data+ML for automation can come from platforms that allow visiting and posting messages in peer-to-peer anonymous manner for sharing their experiences (frequently stigmatized) and seeking support. In this paper, we propose a VA that can act as the first point of contact and comfort for mental health patients. We curate a dataset, Motivational VA: MotiVAte comprising of 7k dyadic conversations collected from a peer-to-peer support platform. The system employs two mechanisms: (i) Mental Illness Classification: an attention based BERT classifier that outputs the mental disorder category out of the 4 categories, viz., Major Depressive Disorder (MDD), Anxiety, Obsessive Compulsive Disorder (OCD) and Post-traumatic Stress Disorder (PTSD), based on the input ongoing dialog between the support seeker and the VA; and (ii) Mental Illness Conditioned Motivational Dialogue Generation (MI-MDG): a sentiment driven Reinforcement Learning (RL) based motivational response generator. The empirical evaluation demonstrates the system capability by way of outperforming several baselines.

pdf bib
A Deep Learning based Framework for Image Paragraph Generation in Hindi
Santosh Kumar Mishra | Sushant Sinha | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

2021

pdf bib
Multimodal Graph-based Transformer Framework for Biomedical Relation Extraction
Sriram Pingali | Shweta Yadav | Pratik Dutta | Sriparna Saha
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs
A Scaled Encoder Decoder Network for Image Captioning in Hindi
Santosh Kumar Mishra | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Image captioning is a prominent research area in computer vision and natural language processing, which automatically generates natural language descriptions for images. Most of the existing works have focused on developing models for image captioning in the English language. The current paper introduces a novel deep learning architecture based on encoder-decoder with an attention mechanism for image captioning in the Hindi language. For encoder, decoder, and attention, several deep learning-based architectures have been explored. Hindi, the fourth-most spoken language globally, is widely spoken in India and South Asia and is one of India’s official languages. The proposed encoder-decoder architecture utilizes scaling in convolution neural networks to achieve better accuracy than state-of-the-art image captioning methods in Hindi. The proposed method’s performance is compared with state-of-the-art methods in terms of BLEU scores and manual evaluation (in terms of adequacy and fluency). The obtained results demonstrate the efficacy of the proposed method.

pdf bib abs
Wikipedia Current Events Summarization using Particle Swarm Optimization
Santosh Kumar Mishra | Darsh Kaushik | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

This paper proposes a method to summarize news events from multiple sources. We pose event summarization as a clustering-based optimization problem and solve it using particle swarm optimization. The proposed methodology uses the search capability of particle swarm optimization, detecting the number of clusters automatically. Experiments are conducted with the Wikipedia Current Events Portal dataset and evaluated using the well-known ROUGE-1, ROUGE-2, and ROUGE-L scores. The obtained results show the efficacy of the proposed methodology over the state-of-the-art methods. It attained improvement of 33.42%, 81.75%, and 57.58% in terms of ROUGE-1, ROUGE-2, and ROUGE-L, respectively.

pdf bib abs
Towards Sentiment and Emotion aided Multi-modal Speech Act Classification in Twitter
Tulika Saha | Apoorva Upadhyaya | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Speech Act Classification determining the communicative intent of an utterance has been investigated widely over the years as a standalone task. This holds true for discussion in any fora including social media platform such as Twitter. But the emotional state of the tweeter which has a considerable effect on the communication has not received the attention it deserves. Closely related to emotion is sentiment, and understanding of one helps understand the other. In this work, we firstly create a new multi-modal, emotion-TA (‘TA’ means tweet act, i.e., speech act in Twitter) dataset called EmoTA collected from open-source Twitter dataset. We propose a Dyadic Attention Mechanism (DAM) based multi-modal, adversarial multi-tasking framework. DAM incorporates intra-modal and inter-modal attention to fuse multiple modalities and learns generalized features across all the tasks. Experimental results indicate that the proposed framework boosts the performance of the primary task, i.e., TA classification (TAC) by benefitting from the two secondary tasks, i.e., Sentiment and Emotion Analysis compared to its uni-modal and single task TAC (tweet act classification) variants.

2020

pdf bib abs
Towards Emotion-aided Multi-modal Dialogue Act Classification
Tulika Saha | Aditya Patra | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The task of Dialogue Act Classification (DAC) that purports to capture communicative intent has been studied extensively. But these studies limit themselves to text. Non-verbal features (change of tone, facial expressions etc.) can provide cues to identify DAs, thus stressing the benefit of incorporating multi-modal inputs in the task. Also, the emotional state of the speaker has a substantial effect on the choice of the dialogue act, since conversations are often influenced by emotions. Hence, the effect of emotion too on automatic identification of DAs needs to be studied. In this work, we address the role of both multi-modality and emotion recognition (ER) in DAC. DAC and ER help each other by way of multi-task learning. One of the major contributions of this work is a new dataset- multimodal Emotion aware Dialogue Act dataset called EMOTyDA, collected from open-sourced dialogue datasets. To demonstrate the utility of EMOTyDA, we build an attention based (self, inter-modal, inter-task) multi-modal, multi-task Deep Neural Network (DNN) for joint learning of DAs and emotions. We show empirically that multi-modality and multi-tasking achieve better performance of DAC compared to uni-modal and single task DAC variants.

pdf bib abs
Amalgamation of protein sequence, structure and textual information for improving protein-protein interaction identification
Pratik Dutta | Sriparna Saha
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

An in-depth exploration of protein-protein interactions (PPI) is essential to understand the metabolism in addition to the regulations of biological entities like proteins, carbohydrates, and many more. Most of the recent PPI tasks in BioNLP domain have been carried out solely using textual data. In this paper, we argue that incorporating multimodal cues can improve the automatic identification of PPI. As a first step towards enabling the development of multimodal approaches for PPI identification, we have developed two multi-modal datasets which are extensions and multi-modal versions of two popular benchmark PPI corpora (BioInfer and HRPD50). Besides, existing textual modalities, two new modalities, 3D protein structure and underlying genomic sequence, are also added to each instance. Further, a novel deep multi-modal architecture is also implemented to efficiently predict the protein interactions from the developed datasets. A detailed experimental analysis reveals the superiority of the multi-modal approach in comparison to the strong baselines including unimodal approaches and state-of the-art methods over both the generated multi-modal datasets. The developed multi-modal datasets are available for use at https://github.com/sduttap16/MM_PPI_NLP.

pdf bib abs
Semantic Extractor-Paraphraser based Abstractive Summarization
Anubhav Jangra | Raghav Jain | Vaibhav Mavi | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

The anthology of spoken languages today is inundated with textual information, necessitating the development of automatic summarization models. In this manuscript, we propose an extractor-paraphraser based abstractive summarization system that exploits semantic overlap as opposed to its predecessors that focus more on syntactic information overlap. Our model outperforms the state-of-the-art baselines in terms of ROUGE, METEOR and word mover similarity (WMS), establishing the superiority of the proposed system via extensive ablation experiments. We have also challenged the summarization capabilities of the state of the art Pointer Generator Network (PGN), and through thorough experimentation, shown that PGN is more of a paraphraser, contrary to the prevailing notion of a summarizer; illustrating it’s incapability to accumulate information across multiple sentences.

pdf bib abs
A Multi-modal Personality Prediction System
Chanchal Suman | Aditya Gupta | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Automatic prediction of personality traits has many real-life applications, e.g., in forensics, recommender systems, personalized services etc.. In this work, we have proposed a solution framework for solving the problem of predicting the personality traits of a user from videos. Ambient, facial and the audio features are extracted from the video of the user. These features are used for the final output prediction. The visual and audio modalities are combined in two different ways: averaging of predictions obtained from the individual modalities, and concatenation of features in multi-modal setting. The dataset released in Chalearn-16 is used for evaluating the performance of the system. Experimental results illustrate that it is possible to obtain better performance with a hand full of images, rather than using all the images present in the video

pdf bib abs
D-Coref: A Fast and Lightweight Coreference Resolution Model using DistilBERT
Chanchal Suman | Jeetu Kumar | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

Smart devices are often deployed in some edge-devices, which require quality solutions in limited amount of memory usage. In most of the user-interaction based smart devices, coreference resolution is often required. Keeping this in view, we have developed a fast and lightweight coreference resolution model which meets the minimum memory requirement and converges faster. In order to generate the embeddings for solving the task of coreference resolution, DistilBERT, a light weight BERT module is utilized. DistilBERT consumes less memory (only 60% of memory in comparison to BERT-based heavy model) and it is suitable for deployment in edge devices. DistilBERT embedding helps in 60% faster convergence with an accuracy compromise of 2.59%, and 6.49% with respect to its base model and current state-of-the-art, respectively.

Emotion recognition is a very well-attended problem in Natural Language Processing (NLP). Most of the existing works on emotion recognition focus on the general domain and in some cases to specific domains like fairy tales, blogs, weather, Twitter etc. But emotion analysis systems in the domains of security, social issues, technology, politics, sports, etc. are very rare. In this paper, we create a benchmark setup for emotion recognition in these specialised domains. First, we construct a corpus of 18,921 tweets in English annotated with Paul Ekman’s six basic emotions (Anger, Disgust, Fear, Happiness, Sadness, Surprise) and a non-emotive class Others. Thereafter, we propose a deep neural framework to perform emotion recognition in an end-to-end setting. We build various models based on Convolutional Neural Network (CNN), Bi-directional Long Short Term Memory (Bi-LSTM), Bi-directional Gated Recurrent Unit (Bi-GRU). We propose a Hierarchical Attention-based deep neural network for Emotion Detection (HAtED). We also develop multiple systems by considering different sets of emotion classes for each system and report the detailed comparative analysis of the results. Experiments show the hierarchical attention-based model achieves best results among the considered baselines with accuracy of 69%.

pdf bib abs
IIITBH-IITP@CL-SciSumm20, CL-LaySumm20, LongSumm20
Saichethan Reddy | Naveen Saini | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the First Workshop on Scholarly Document Processing

In this paper, we present the IIIT Bhagalpur and IIT Patna team’s effort to solve the three shared tasks namely, CL-SciSumm 2020, CL-LaySumm 2020, LongSumm 2020 at SDP 2020. The theme of these tasks is to generate medium-scale, lay and long summaries, respectively, for scientific articles. For the first two tasks, unsupervised systems are developed, while for the third one, we develop a supervised system. The performances of all the systems were evaluated on the associated datasets with the shared tasks in term of well-known ROUGE metric.

pdf bib abs
IITP-AI-NLP-ML@ CL-SciSumm 2020, CL-LaySumm 2020, LongSumm 2020
Santosh Kumar Mishra | Harshavardhan Kundarapu | Naveen Saini | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the First Workshop on Scholarly Document Processing

The publication rate of scientific literature increases rapidly, which poses a challenge for researchers to keep themselves updated with new state-of-the-art. Scientific document summarization solves this problem by summarizing the essential fact and findings of the document. In the current paper, we present the participation of IITP-AI-NLP-ML team in three shared tasks, namely, CL-SciSumm 2020, LaySumm 2020, LongSumm 2020, which aims to generate medium, lay, and long summaries of the scientific articles, respectively. To solve CL-SciSumm 2020 and LongSumm 2020 tasks, three well-known clustering techniques are used, and then various sentence scoring functions, including textual entailment, are used to extract the sentences from each cluster for a summary generation. For LaySumm 2020, an encoder-decoder based deep learning model has been utilized. Performances of our developed systems are evaluated in terms of ROUGE measures on the associated datasets with the shared task.

pdf bib abs
EL-BERT at SemEval-2020 Task 10: A Multi-Embedding Ensemble Based Approach for Emphasis Selection in Visual Media
Chandresh Kanani | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In visual media, text emphasis is the strengthening of words in a text to convey the intent of the author. Text emphasis in visual media is generally done by using different colors, backgrounds, or font for the text; it helps in conveying the actual meaning of the message to the readers. Emphasis selection is the task of choosing candidate words for emphasis, it helps in automatically designing posters and other media contents with written text. If we consider only the text and do not know the intent, then there can be multiple valid emphasis selections. We propose the use of ensembles for emphasis selection to improve over single emphasis selection models. We show that the use of multi-embedding helps in enhancing the results for base models. To show the efficacy of proposed approach we have also done a comparison of our results with state-of-the-art models.

pdf bib abs
CyberTronics at SemEval-2020 Task 12: Multilingual Offensive Language Identification over Social Media
Sayanta Paul | Sriparna Saha | Mohammed Hasanuzzaman
Proceedings of the Fourteenth Workshop on Semantic Evaluation

The SemEval-2020 Task 12 (OffensEval) challenge focuses on detection of signs of offensiveness using posts or comments over social media. This task has been organized for several languages, e.g., Arabic, Danish, English, Greek and Turkish. It has featured three related sub-tasks for English language: sub-task A was to discriminate between offensive and non-offensive posts, the focus of sub-task B was on the type of offensive content in the post and finally, in sub-task C, proposed systems had to identify the target of the offensive posts. The corpus for each of the languages is developed using the posts and comments over Twitter, a popular social media platform. We have participated in this challenge and submitted results for different languages. The current work presents different machine learning and deep learning techniques and analyzes their performance for offensiveness prediction which involves various classifiers and feature engineering schemes. The experimental analysis on the training set shows that SVM using language specific pre-trained word embedding (Fasttext) outperforms the other methods. Our system achieves a macro-averaged F1 score of 0.45 for Arabic language, 0.43 for Greek language and 0.54 for Turkish language.

2019

pdf bib abs
A Unified Multi-task Adversarial Learning Framework for Pharmacovigilance Mining
Shweta Yadav | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The mining of adverse drug reaction (ADR) has a crucial role in the pharmacovigilance. The traditional ways of identifying ADR are reliable but time-consuming, non-scalable and offer a very limited amount of ADR relevant information. With the unprecedented growth of information sources in the forms of social media texts (Twitter, Blogs, Reviews etc.), biomedical literature, and Electronic Medical Records (EMR), it has become crucial to extract the most pertinent ADR related information from these free-form texts. In this paper, we propose a neural network inspired multi- task learning framework that can simultaneously extract ADRs from various sources. We adopt a novel adversarial learning-based approach to learn features across multiple ADR information sources. Unlike the other existing techniques, our approach is capable to extracting fine-grained information (such as ‘Indications’, ‘Symptoms’, ‘Finding’, ‘Disease’, ‘Drug’) which provide important cues in pharmacovigilance. We evaluate our proposed approach on three publicly available real- world benchmark pharmacovigilance datasets, a Twitter dataset from PSB 2016 Social Me- dia Shared Task, CADEC corpus and Medline ADR corpus. Experiments show that our unified framework achieves state-of-the-art performance on individual tasks associated with the different benchmark datasets. This establishes the fact that our proposed approach is generic, which enables it to achieve high performance on the diverse datasets.

2018

pdf bib abs
Incorporating Deep Visual Features into Multiobjective based Multi-view Search Results Clustering
Sayantan Mitra | Mohammed Hasanuzzaman | Sriparna Saha | Andy Way
Proceedings of the 27th International Conference on Computational Linguistics

Current paper explores the use of multi-view learning for search result clustering. A web-snippet can be represented using multiple views. Apart from textual view cued by both the semantic and syntactic information, a complimentary view extracted from images contained in the web-snippets is also utilized in the current framework. A single consensus partitioning is finally obtained after consulting these two individual views by the deployment of a multiobjective based clustering technique. Several objective functions including the values of a cluster quality measure measuring the goodness of partitionings obtained using different views and an agreement-disagreement index, quantifying the amount of oneness among multiple views in generating partitionings are optimized simultaneously using AMOSA. In order to detect the number of clusters automatically, concepts of variable length solutions and a vast range of permutation operators are introduced in the clustering process. Finally, a set of alternative partitioning are obtained on the final Pareto front by the proposed multi-view based multiobjective technique. Experimental results by the proposed approach on several benchmark test datasets of SRC with respect to different performance metrics evidently establish the power of visual and text-based views in achieving better search result clustering.

pdf bib
Medical Sentiment Analysis using Social Media: Towards building a Patient Assisted System
Shweta Yadav | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs
Multi-Task Learning Framework for Mining Crowd Intelligence towards Clinical Treatment
Shweta Yadav | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya | Amit Sheth
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

In recent past, social media has emerged as an active platform in the context of healthcare and medicine. In this paper, we present a study where medical user’s opinions on health-related issues are analyzed to capture the medical sentiment at a blog level. The medical sentiments can be studied in various facets such as medical condition, treatment, and medication that characterize the overall health status of the user. Considering these facets, we treat analysis of this information as a multi-task classification problem. In this paper, we adopt a novel adversarial learning approach for our multi-task learning framework to learn the sentiment’s strengths expressed in a medical blog. Our evaluation shows promising results for our target tasks.

2017

pdf bib abs
Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection
Shweta Yadav | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Text mining has drawn significant attention in recent past due to the rapid growth in biomedical and clinical records. Entity extraction is one of the fundamental components for biomedical text mining. In this paper, we propose a novel approach of feature selection for entity extraction that exploits the concept of deep learning and Particle Swarm Optimization (PSO). The system utilizes word embedding features along with several other features extracted by studying the properties of the datasets. We obtain an interesting observation that compact word embedding features as determined by PSO are more effective compared to the entire word embedding feature set for entity extraction. The proposed system is evaluated on three benchmark biomedical datasets such as GENIA, GENETAG, and AiMed. The effectiveness of the proposed approach is evident with significant performance gains over the baseline models as well as the other existing systems. We observe improvements of 7.86%, 5.27% and 7.25% F-measure points over the baseline models for GENIA, GENETAG, and AiMed dataset respectively.

pdf bib abs
Temporal Orientation of Tweets for Predicting Income of Users
Mohammed Hasanuzzaman | Sabyasachi Kamila | Mandeep Kaur | Sriparna Saha | Asif Ekbal
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Automatically estimating a user’s socio-economic profile from their language use in social media can significantly help social science research and various downstream applications ranging from business to politics. The current paper presents the first study where user cognitive structure is used to build a predictive model of income. In particular, we first develop a classifier using a weakly supervised learning framework to automatically time-tag tweets as past, present, or future. We quantify a user’s overall temporal orientation based on their distribution of tweets, and use it to build a predictive model of income. Our analysis uncovers a correlation between future temporal orientation and income. Finally, we measure the predictive power of future temporal orientation on income by performing regression.

2016

pdf bib abs
Semi-supervised Clustering of Medical Text
Pracheta Sahoo | Asif Ekbal | Sriparna Saha | Diego Mollá | Kaushik Nandan
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

Semi-supervised clustering is an attractive alternative for traditional (unsupervised) clustering in targeted applications. By using the information of a small annotated dataset, semi-supervised clustering can produce clusters that are customized to the application domain. In this paper, we present a semi-supervised clustering technique based on a multi-objective evolutionary algorithm (NSGA-II-clus). We apply this technique to the task of clustering medical publications for Evidence Based Medicine (EBM) and observe an improvement of the results against unsupervised and other semi-supervised clustering techniques.

pdf bib abs
Deep Learning Architecture for Patient Data De-identification in Clinical Records
Shweta Yadav | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

Rapid growth in Electronic Medical Records (EMR) has emerged to an expansion of data in the clinical domain. The majority of the available health care information is sealed in the form of narrative documents which form the rich source of clinical information. Text mining of such clinical records has gained huge attention in various medical applications like treatment and decision making. However, medical records enclose patient Private Health Information (PHI) which can reveal the identities of the patients. In order to retain the privacy of patients, it is mandatory to remove all the PHI information prior to making it publicly available. The aim is to de-identify or encrypt the PHI from the patient medical records. In this paper, we propose an algorithm based on deep learning architecture to solve this problem. We perform de-identification of seven PHI terms from the clinical records. Experiments on benchmark datasets show that our proposed approach achieves encouraging performance, which is better than the baseline model developed with Conditional Random Field.

pdf bib
Improving Document Ranking using Query Expansion and Classification Techniques for Mixed Script Information Retrieval
Subham Kumar | Anwesh Sinha Ray | Sabyasachi Kamila | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing

pdf bib
A Recurrent Neural Network Architecture for De-identifying Clinical Records
Shweta | Ankit Kumar | Asif Ekbal | Sriparna Saha | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing

In this paper, we propose classifier ensemble selection for Named Entity Recognition (NER) as a single objective optimization problem. Thereafter, we develop a method based on genetic algorithm (GA) to solve this problem. Our underlying assumption is that rather than searching for the best feature set for a particular classifier, ensembling of several classifiers which are trained using different feature representations could be a more fruitful approach. Maximum Entropy (ME) framework is used to generate a number of classifiers by considering the various combinations of the available features. In the proposed approach, classifiers are encoded in the chromosomes. A single measure of classification quality, namely F-measure is used as the objective function. Evaluation results on a resource constrained language like Bengali yield the recall, precision and F-measure values of 71.14%, 84.07% and 77.11%, respectively. Experiments also show that the classifier ensemble identified by the proposed GA based approach attains higher performance than all the individual classifiers and two different conventional baseline ensembles.

pdf bib
Finding Appropriate Subset of Votes Per Classifier Using Multiobjective Optimization: Application to Named Entity Recognition
Asif Ekbal | Sriparna Saha | Md. Hasanuzzaman
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
Feature Subset Selection Using Genetic Algorithm for Named Entity Recognition
Md. Hasanuzzaman | Sriparna Saha | Asif Ekbal
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation