Debarati Das


2024

pdf bib
Which Modality should I use - Text, Motif, or Image? : Understanding Graphs with Large Language Models
Debarati Das | Ishaan Gupta | Jaideep Srivastava | Dongyeop Kang
Findings of the Association for Computational Linguistics: NAACL 2024

Our research integrates graph data with Large Language Models (LLMs), which, despite their advancements in various fields using large text corpora, face limitations in encoding entire graphs due to context size constraints. This paper introduces a new approach to encoding a graph with diverse modalities, such as text, image, and motif, coupled with prompts to approximate a graph’s global connectivity, thereby enhancing LLMs’ efficiency in processing complex graph structures. The study also presents GraphTMI, a novel benchmark for evaluating LLMs in graph structure analysis, focusing on homophily, motif presence, and graph difficulty. Key findings indicate that the image modality, especially with vision-language models like GPT-4V, is superior to text in balancing token limits and preserving essential information and comes close to prior graph neural net (GNN) encoders. Furthermore, the research assesses how various factors affect the performance of each encoding modality and outlines the existing challenges and potential future developments for LLMs in graph understanding and reasoning tasks. Our code and data are publicly available on our project page - https://minnesotanlp.github.io/GraphLLM/

2023

pdf bib
Balancing the Effect of Training Dataset Distribution of Multiple Styles for Multi-Style Text Transfer
Debarati Das | David Ma | Dongyeop Kang
Findings of the Association for Computational Linguistics: ACL 2023

Text style transfer is an exciting task within the field of natural language generation that is often plagued by the need for high-quality paired datasets. Furthermore, training a model for multi-attribute text style transfer requires datasets with sufficient support across all combinations of the considered stylistic attributes, adding to the challenges of training a style transfer model. This paper explores the impact of training data input diversity on the quality of the generated text from the multi-style transfer model. We construct a pseudo-parallel dataset by devising heuristics to adjust the style distribution in the training samples. We balance our training dataset using marginal and joint distributions to train our style transfer models. We observe that a balanced dataset produces more effective control effects over multiple styles than an imbalanced or skewed one. Through quantitative analysis, we explore the impact of multiple style distributions in training data on style-transferred output. These findings will better inform the design of style-transfer datasets.

2022

pdf bib
AdBERT: An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements
Debarati Das | Roopana Chenchu | Maral Abdollahi | Jisu Huh | Jaideep Srivastava
Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)

The tremendous increase in social media usage for sharing Television (TV) experiences has provided a unique opportunity in the Public Health and Marketing sectors to understand viewer engagement and attitudes through viewer-generated content on social media. However, this opportunity also comes with associated technical challenges. Specifically, given a televised event and related tweets about this event, we need methods to effectively align these tweets and the corresponding event. In this paper, we consider the specific ecosystem of the Superbowl 2020 and map viewer tweets to advertisements they are referring to. Our proposed model, AdBERT, is an effective few-shot learning framework that is able to handle the technical challenges of establishing ad-relatedness, class imbalance as well as the scarcity of labeled data. As part of this study, we have curated and developed two datasets that can prove to be useful for Social TV research: 1) dataset of ad-related tweets and 2) dataset of ad descriptions of Superbowl advertisements. Explaining connections to SentenceBERT, we describe the advantages of AdBERT that allow us to make the most out of a challenging and interesting dataset which we will open-source along with the models developed in this paper.

2016

pdf bib
A Computational Analysis of Mahabharata
Debarati Das | Bhaskarjyoti Das | Kavi Mahesh
Proceedings of the 13th International Conference on Natural Language Processing