Jaideep Srivastava


2024

pdf bib
Which Modality should I use - Text, Motif, or Image? : Understanding Graphs with Large Language Models
Debarati Das | Ishaan Gupta | Jaideep Srivastava | Dongyeop Kang
Findings of the Association for Computational Linguistics: NAACL 2024

Our research integrates graph data with Large Language Models (LLMs), which, despite their advancements in various fields using large text corpora, face limitations in encoding entire graphs due to context size constraints. This paper introduces a new approach to encoding a graph with diverse modalities, such as text, image, and motif, coupled with prompts to approximate a graph’s global connectivity, thereby enhancing LLMs’ efficiency in processing complex graph structures. The study also presents GraphTMI, a novel benchmark for evaluating LLMs in graph structure analysis, focusing on homophily, motif presence, and graph difficulty. Key findings indicate that the image modality, especially with vision-language models like GPT-4V, is superior to text in balancing token limits and preserving essential information and comes close to prior graph neural net (GNN) encoders. Furthermore, the research assesses how various factors affect the performance of each encoding modality and outlines the existing challenges and potential future developments for LLMs in graph understanding and reasoning tasks. Our code and data are publicly available on our project page - https://minnesotanlp.github.io/GraphLLM/

pdf bib
SkOTaPA: A Dataset for Skepticism Detection in Online Text after Persuasion Attempt
Smitha Muthya Sudheendra | Maral Abdollahi | Dongyeop Kang | Jisu Huh | Jaideep Srivastava
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Individuals often encounter persuasion attempts, during which a persuasion agent aims to persuade a target to change the target’s emotions, beliefs, and behaviors. These persuasion attempts can be observed in various social settings, such as advertising, public health, political campaigns, and personal relationships. During these persuasion attempts, targets generally like to preserve their autonomy, so their responses often manifest in some form of resistance, like a skeptical reaction. In order to detect such skepticism in response to persuasion attempts on social media, we developed a corpus based on consumer psychology. In this paper, we consider one of the most prominent areas in which persuasion attempts unfold: social media influencer marketing. In this paper, we introduce the skepticism detection corpus, SkOTaPA, which was developed using multiple independent human annotations, and inter-coder reliability was evaluated with Krippendorff’s alpha (0.709). We performed validity tests to show skepticism cannot be detected using other potential proxy variables like sentiment and sarcasm.

2022

pdf bib
AdBERT: An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements
Debarati Das | Roopana Chenchu | Maral Abdollahi | Jisu Huh | Jaideep Srivastava
Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)

The tremendous increase in social media usage for sharing Television (TV) experiences has provided a unique opportunity in the Public Health and Marketing sectors to understand viewer engagement and attitudes through viewer-generated content on social media. However, this opportunity also comes with associated technical challenges. Specifically, given a televised event and related tweets about this event, we need methods to effectively align these tweets and the corresponding event. In this paper, we consider the specific ecosystem of the Superbowl 2020 and map viewer tweets to advertisements they are referring to. Our proposed model, AdBERT, is an effective few-shot learning framework that is able to handle the technical challenges of establishing ad-relatedness, class imbalance as well as the scarcity of labeled data. As part of this study, we have curated and developed two datasets that can prove to be useful for Social TV research: 1) dataset of ad-related tweets and 2) dataset of ad descriptions of Superbowl advertisements. Explaining connections to SentenceBERT, we describe the advantages of AdBERT that allow us to make the most out of a challenging and interesting dataset which we will open-source along with the models developed in this paper.

2019

pdf bib
Using Clinical Notes with Time Series Data for ICU Management
Swaraj Khadanga | Karan Aggarwal | Shafiq Joty | Jaideep Srivastava
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Monitoring patients in ICU is a challenging and high-cost task. Hence, predicting the condition of patients during their ICU stay can help provide better acute care and plan the hospital’s resources. There has been continuous progress in machine learning research for ICU management, and most of this work has focused on using time series signals recorded by ICU instruments. In our work, we show that adding clinical notes as another modality improves the performance of the model for three benchmark tasks: in-hospital mortality prediction, modeling decompensation, and length of stay forecasting that play an important role in ICU management. While the time-series data is measured at regular intervals, doctor notes are charted at irregular times, making it challenging to model them together. We propose a method to model them jointly, achieving considerable improvement across benchmark tasks over baseline time-series model.