Sujatha Das Gollapalli

Also published as: Sujatha Das, Sujatha Das Gollapalli

2026

Pro-QuEST: A Prompt-chain based Quiz Engine for testing Specialized Technical Product Knowledge
Sujatha Das Gollapalli | Mouad Hakam | Mingzhe Du | See-Kiong Ng | Mohammed Hamzeh
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)

In today’s rapidly evolving large language model (LLM) landscape, technology companies such as Cisco face the difficult challengeof selecting the most suitable model for downstream tasks that demand deep, domain-specificproduct knowledge. Specialized benchmarks not only inform this decision making but alsocan be leveraged to rapidly create quizzes that can effectively train engineering and marketingpersonnel on novel product offerings in a continually growing Cisco product space.We present Pro-QuEST, our Prompt-chain based Quiz Engine using state-of-the-art LLMsfor generating multiple-choice questions on Specialized Technical products. In Pro-QuEST,we first identify key terms and topics from a given professional certification textbook orproduct guide, and generate a series of multiple-choice questions using domain-knowledgeguided prompts. We show LLM benchmarking results with the question benchmarks generated by Pro-QuEST using a range of latestopen-source, and proprietary LLMs and compare them with expert-created exams and review questions to derive insights on their composition and difficulty. Our experiments indicate that though there is room for improvementin Pro-QuEST to generate questions of the complexity levels seen in expert-designed certification exams, question-type based prompts provide a promising direction to address this limitation. In sample user studies with Cisco personnel, Pro-QuEST was received with high optimism for its practical usefulness in quicklycompiling quizzes for self-assessment on knowledge of novel products in the rapidly changing tech sector.

pdf bib abs

NUS-IDS at AMIYA/VarDial 2026: Improving Arabic Dialectness in LLMs with Reinforcement Learning
Sujatha Das Gollapalli | Mouad Hakam | Mingzhe Du | See-Kiong Ng
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects

In this paper, we describe models developed by our team, NUS-IDS, for the Closed data track at the Arabic Modeling In Your Accent (AMIYA) shared task at VarDial 2026. The core idea behind our solution involves data augmentation enabled by a dialect classifier trained on AMIYA data. We effectively combine various translation, summarization, and question answering prompts with AMIYA training data to form dialectal prompts for use with state-of-the-art LLMs. Next, dialect predictions from our classifier on outputs from these LLMs are used to compile preference data for Reinforcement Learning (RL). We report model performance on dialectal Arabic from Egypt, Morocco, Palestine, Saudi Arabia and Syria using FLORES+, a multilingual machine translation dataset. Our experiments illustrate that though our RL models show significant performance gains on dialectness scores, they under perform on translation metrics such as chrF++ compared to base LLMs.

2025

pdf bib abs

On Assigning Product and Software Codes to Customer Service Requests with Large Language Models
Sujatha Das Gollapalli | Mouad Hakam | Mingzhe Du | See-Kiong Ng | Mohammed Hamzeh
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

In a technology company, quality of customer service that involves providingtroubleshooting assistance and advice to customers is a crucial asset.Often, insights from historical customer service data are used to make decisions related to future product offerings. In this paper, we address the challenging problem of automatic assignment of product names and software version labels to customer Service Requests (SRs) related to BLIND, a company in the networking domain.We study the effectiveness of state-of-the-art Large Language Models (LLMs) in assigning the correct product name codes and software versions from several possible label options and their “non-canonical” mentions in the associated SR data. To this end, we frame the assignment as a multiple-choice question answering task instead of conventional prompts and devise, to our knowledge, a novel pipeline of employing a classifier for filtering inputs to the LLM for saving usage costs. On our experimental dataset based on real SRs, we are able to correctly identify product name and software version labels when they are mentioned with over 90% accuracy while cutting LLM costs by ~40-60% on average, thus providing a viable solution for practical deployment.

pdf bib abs

PIRsuader: A Persuasive Chatbot for Mitigating Psychological Insulin Resistance in Type-2 Diabetic Patients
Sujatha Das Gollapalli | See-Kiong Ng
Proceedings of the 31st International Conference on Computational Linguistics

Psychological Insulin Resistance (PIR) is described as the reluctance towards initiation and adherence of insulin-based treatments due to psychological barriers in diabetic patients. Though studies have shown that timely initiation with lifestyle changes are known to be crucial in sugar control and prevention of chronic conditions in Type 2 Diabetes (T2D) patients, many patients often have deep-rooted fears and misgivings related to insulin which hinder them from adapting to an insulin-based treatment regimen when recommended by healthcare specialists. Therefore, it is vitally important to address and allay these fallacious beliefs in T2D patients and persuade them to consider insulin as a treatment option. In this paper, we describe the design of PIRsuader, a persuasive chatbot for mitigating PIR in T2D patients. In PIRsuader, we effectively harness the conversation generation capabilities of state-of-the-art Large Language Models via a context-specific persuasive dialog act schema. We design reward functions that capture dialog act preferences for persuading reluctant patients and apply reinforcement learning to learn a dialog act prediction model. Our experiments using a collection of real doctor-diabetic patient conversations indicate that PIRsuader is able to improve the willingness in patients to try insulin as well as address specific concerns they have in an empathetic manner.

2023

pdf bib abs

NUS-IDS at PragTag-2023: Improving Pragmatic Tagging of Peer Reviews through Unlabeled Data
Sujatha Das Gollapalli | Yixin Huang | See-Kiong Ng
Proceedings of the 10th Workshop on Argument Mining

We describe our models for the Pragmatic Tagging of Peer Reviews Shared Task at the 10th Workshop on Argument Mining at EMNLP-2023. We trained multiple sentence classification models for the above competition task by employing various state-of-the-art transformer models that can be fine-tuned either in the traditional way or through instruction-based fine-tuning. Multiple model predictions on unlabeled data are combined to tentatively label unlabeled instances and augment the dataset to further improve performance on the prediction task. In particular, on the F1000RD corpus, we perform on-par with models trained on 100% of the training data while using only 10% of the data. Overall, on the competition datasets, we rank among the top-2 performers for the different data conditions.

pdf bib abs

Socratic Question Generation: A Novel Dataset, Models, and Evaluation
Beng Heng Ang | Sujatha Das Gollapalli | See-Kiong Ng
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Socratic questioning is a form of reflective inquiry often employed in education to encourage critical thinking in students, and to elicit awareness of beliefs and perspectives in a subject during therapeutic counseling. Specific types of Socratic questions are employed for enabling reasoning and alternate views against the context of individual personal opinions on a topic. Socratic contexts are different from traditional question generation contexts where “answer-seeking” questions are generated against a given formal passage on a topic, narrative stories or conversations. We present SocratiQ, the first large dataset of 110K (question, context) pairs for enabling studies on Socratic Question Generation (SoQG). We provide an in-depth study on the various types of Socratic questions and present models for generating Socratic questions against a given context through prompt tuning. Our automated and human evaluation results demonstrate that our SoQG models can produce realistic, type-sensitive, human-like Socratic questions enabling potential applications in counseling and coaching.

2022

pdf bib abs

QSTS: A Question-Sensitive Text Similarity Measure for Question Generation
Sujatha Das Gollapalli | See-Kiong Ng
Proceedings of the 29th International Conference on Computational Linguistics

While question generation (QG) has received significant focus in conversation modeling and text generation research, the problems of comparing questions and evaluation of QG models have remained inadequately addressed. Indeed, QG models continue to be evaluated using traditional measures such as BLEU, METEOR, and ROUGE scores which were designed for other text generation problems. We propose QSTS, a novel Question-Sensitive Text Similarity measure for comparing two questions by characterizing their target intent based on question class, named-entity, and semantic similarity information from the two questions. We show that QSTS addresses several shortcomings of existing measures that depend on n-gram overlap scores and obtains superior results compared to traditional measures on publicly-available QG datasets. We also collect a novel dataset SimQG, for enabling question similarity research in QG contexts. SimQG contains questions generated by state-of-the-art QG models along with human judgements on their relevance with respect to the passage context they were generated for as well as when compared to the given reference question. Using SimQG, we showcase the key aspect of QSTS that differentiates it from all existing measures. QSTS is not only able to characterize similarity between two questions, but is also able to score questions with respect to passage contexts. Thus QSTS is, to our knowledge, the first metric that enables the measurement of QG performance in a reference-free manner.

2021

pdf bib abs

On Generating Fact-Infused Question Variations
Arthur Deschamps | Sujatha Das Gollapalli | See-Kiong Ng
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

To fully model human-like ability to ask questions, automatic question generation (QG) models must be able to produce multiple expressions of the same question with different levels of detail. Unfortunately, existing datasets available for learning QG do not include paraphrases or question variations affecting a model’s ability to learn this capability. We present FIRS, a dataset containing human-generated fact-infused rewrites of questions from the widely-used SQuAD dataset to address this limitation. Questions in FIRS were obtained by combining a given question with facts of entities referenced in the question. We study a double encoder-decoder model, Fact-Infused Question Generator (FIQG), for learning to generate fact-infused questions from a given question. Experimental results show that FIQG effectively incorporates information from facts to add more detail to a given question. To the best of our knowledge, ours is the first study to present fact-infusion as a novel form of question paraphrasing.

pdf bib abs

Suicide Risk Prediction by Tracking Self-Harm Aspects in Tweets: NUS-IDS at the CLPsych 2021 Shared Task
Sujatha Das Gollapalli | Guilherme Augusto Zagatti | See-Kiong Ng
Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access

We describe our system for identifying users at-risk for suicide based on their tweets developed for the CLPsych 2021 Shared Task. Based on research in mental health studies linking self-harm tendencies with suicide, in our system, we attempt to characterize self-harm aspects expressed in user tweets over a period of time. To this end, we design SHTM, a Self-Harm Topic Model that combines Latent Dirichlet Allocation with a self-harm dictionary for modeling daily tweets of users. Next, differences in moods and topics over time are captured as features to train a deep learning model for suicide prediction.

pdf bib abs

NUS-IDS at CASE 2021 Task 1: Improving Multilingual Event Sentence Coreference Identification With Linguistic Information
Fiona Anting Tan | Sujatha Das Gollapalli | See-Kiong Ng
Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021)

Event Sentence Coreference Identification (ESCI) aims to cluster event sentences that refer to the same event together for information extraction. We describe our ESCI solution developed for the ACL-CASE 2021 shared tasks on the detection and classification of socio-political and crisis event information in a multilingual setting. For a given article, our proposed pipeline comprises of an accurate sentence pair classifier that identifies coreferent sentence pairs and subsequently uses these predicted probabilities to cluster sentences into groups. Sentence pair representations are constructed from fine-tuned BERT embeddings plus POS embeddings fed through a BiLSTM model, and combined with linguistic-based lexical and semantic similarities between sentences. Our best models ranked 2nd, 1st and 2nd and obtained CoNLL F1 scores of 81.20%, 93.03%, 83.15% for the English, Portuguese and Spanish test sets respectively in the ACL-CASE 2021 competition.

2020

pdf bib abs

On the Use of Web Search to Improve Scientific Collections
Krutarth Patel | Cornelia Caragea | Sujatha Das Gollapalli
Proceedings of the First Workshop on Scholarly Document Processing

Despite the advancements in search engine features, ranking methods, technologies, and the availability of programmable APIs, current-day open-access digital libraries still rely on crawl-based approaches for acquiring their underlying document collections. In this paper, we propose a novel search-driven framework for acquiring documents for such scientific portals. Within our framework, publicly-available research paper titles and author names are used as queries to a Web search engine. We were able to obtain ~267,000 unique research papers through our fully-automated framework using ~76,000 queries, resulting in almost 200,000 more papers than the number of queries. Moreover, through a combination of title and author name search, we were able to recover 78% of the original searched titles.

pdf bib abs

ESTeR: Combining Word Co-occurrences and Word Associations for Unsupervised Emotion Detection
Sujatha Das Gollapalli | Polina Rozenshtein | See-Kiong Ng
Findings of the Association for Computational Linguistics: EMNLP 2020

Accurate detection of emotions in user- generated text was shown to have several applications for e-commerce, public well-being, and disaster management. Currently, the state-of-the-art performance for emotion detection in text is obtained using complex, deep learning models trained on domain-specific, labeled data. In this paper, we propose ESTeR , an unsupervised model for identifying emotions using a novel similarity function based on random walks on graphs. Our model combines large-scale word co-occurrence information with word-associations from lexicons avoiding not only the dependence on labeled datasets, but also an explicit mapping of words to latent spaces used in emotion-enriched word embeddings. Our similarity function can also be computed efficiently. We study a range of datasets including recent tweets related to COVID-19 to illustrate the superior performance of our model and report insights on public emotions during the on-going pandemic.