Sindhuja Gopalan

2025

Zero-shot Slot Filling in the Age of LLMs for Dialogue Systems
Mansi Rana | Kadri Hacioglu | Sindhuja Gopalan | Maragathamani Boothalingam
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track

Zero-shot slot filling is a well-established subtask of Natural Language Understanding (NLU). However, most existing methods primarily focus on single-turn text data, overlooking the unique complexities of conversational dialogue. Conversational data is highly dynamic, often involving abrupt topic shifts, interruptions, and implicit references that make it difficult to directly apply zero-shot slot filling techniques, even with the remarkable capabilities of large language models (LLMs). This paper addresses these challenges by proposing strategies for automatic data annotation with slot induction and black-box knowledge distillation (KD) from a teacher LLM to a smaller model, outperforming vanilla LLMs on internal datasets by 26% absolute increase in F1 score. Additionally, we introduce an efficient system architecture for call center product settings that surpasses off-the-shelf extractive models by 34% relative F1 score, enabling near real-time inference on dialogue streams with higher accuracy, while preserving low latency.

2023

pdf bib abs

Scaling Neural ITN for Numbers and Temporal Expressions in Tamil: Findings for an Agglutinative Low-resource Language
Bhavuk Singhal | Sindhuja Gopalan | Amrith Krishna | Malolan Chetlur
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

ITN involves rewriting the verbalised form of text from spoken transcripts to its corresponding written form. The task inherently expects challenges in identifying ITN entries due to spelling variations in words arising out of dialects, transcription errors etc. Additionally, in Tamil, word boundaries between adjacent words in a sentence often get obscured due to Punarchi, i.e. phonetic transformation of these boundaries. Being morphologically rich, the words in Tamil show a high degree of agglutination due to inflection and clitics. The combination of such factors leads to a high degree of surface-form variations, making scalability with pure rule-based approaches difficult. Instead, we experiment with fine-tuning three pre-trained neural LMs, consisting of a seq2seq model (s2s), a non-autoregressive text editor (NAR) and a sequence tagger + rules combination (tagger). While the tagger approach works best in a fully-supervised setting, s2s performs the best (98.05 F-Score) when augmented with additional data, via bootstrapping and data augmentation (DA&B). S2S reports a cumulative percentage improvement of 20.1 %, and statistically significant gains for all our models with DA&B. Compared to a fully supervised setup, bootstrapping alone reports a percentage improvement as high as 14.12 %, even with a small seed set of 324 ITN entries.

2017

pdf bib

Scalable Bio-Molecular Event Extraction System towards Knowledge Acquisition
Pattabhi RK Rao | Sindhuja Gopalan | Sobha Lalitha Devi
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

pdf bib

Cross Linguistic Variations in Discourse Relations among Indian Languages
Sindhuja Gopalan | Lakshmi S | Sobha Lalitha Devi
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

2016

pdf bib abs

BioDCA Identifier: A System for Automatic Identification of Discourse Connective and Arguments from Biomedical Text
Sindhuja Gopalan | Sobha Lalitha Devi
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

This paper describes a Natural language processing system developed for automatic identification of explicit connectives, its sense and arguments. Prior work has shown that the difference in usage of connectives across corpora affects the cross domain connective identification task negatively. Hence the development of domain specific discourse parser has become indispensable. Here, we present a corpus annotated with discourse relations on Medline abstracts. Kappa score is calculated to check the annotation quality of our corpus. The previous works on discourse analysis in bio-medical data have concentrated only on the identification of connectives and hence we have developed an end-end parser for connective and argument identification using Conditional Random Fields algorithm. The type and sub-type of the connective sense is also identified. The results obtained are encouraging.