Suramya Jadhav

2025

Non-Contextual BERT or FastText? A Comparative Analysis
Abhay Shanbhag | Suramya Jadhav | Amogh Thakurdesai | Ridhima Bhaskar Sinare | Raviraj Joshi
Proceedings of the Workshop on Beyond English: Natural Language Processing for all Languages in an Era of Large Language Models

Natural Language Processing (NLP) for low-resource languages, which lack large annotated datasets, faces significant challenges due to limited high-quality data and linguistic resources. The selection of embeddings plays a critical role in achieving strong performance in NLP tasks. While contextual BERT embeddings require a full forward pass, non-contextual BERT embeddings rely only on table lookup. Existing research has primarily focused on contextual BERT embeddings, leaving non-contextual embeddings largely unexplored. In this study, we analyze the effectiveness of non-contextual embeddings from BERT models (MuRIL and MahaBERT) and FastText models (IndicFT and MahaFT) for tasks such as news classification, sentiment analysis, and hate speech detection in one such low-resource language—Marathi. We compare these embeddings with their contextual and compressed variants. Our findings indicate that non-contextual BERT embeddings extracted from the model’s first embedding layer outperform FastText embeddings, presenting a promising alternative for low-resource NLP.

pdf bib

On Limitations of LLM as Annotator for Low Resource Languages
Suramya Jadhav | Abhay Shanbhag | Amogh Thakurdesai | Ridhima Sinare | Raviraj Joshi
Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP-2025)

pdf bib

MahaParaphrase: A Marathi Paraphrase Detection Corpus and BERT-based Models
Abhay Shanbhag | Suramya Jadhav | Amogh Thakurdesai | Ridhima Sinare | Raviraj Joshi | Ananya Joshi
Proceedings of the 39th Pacific Asia Conference on Language, Information and Computation

2024

pdf bib abs

Maven at MEDIQA-CORR 2024: Leveraging RAG and Medical LLM for Error Detection and Correction in Medical Notes
Suramya Jadhav | Abhay Shanbhag | Sumedh Joshi | Atharva Date | Sheetal Sonawane
Proceedings of the 6th Clinical Natural Language Processing Workshop

Addressing the critical challenge of identifying and rectifying medical errors in clinical notes, we present a novel approach tailored for the MEDIQA-CORR task @ NAACL-ClinicalNLP 2024, which comprises three subtasks: binary classification, span identification, and natural language generation for error detection and correction. Binary classification involves detecting whether the text contains a medical error; span identification entails identifying the text span associated with any detected error; and natural language generation focuses on providing a free text correction if a medical error exists. Our proposed architecture leverages Named Entity Recognition (NER) for identifying disease-related terms, Retrieval-Augmented Generation (RAG) for contextual understanding from external datasets, and a quantized and fine-tuned Palmyra model for error correction. Our model achieved a global rank of 5 with an aggregate score of 0.73298, calculated as the mean of ROUGE-1-F, BERTScore, and BLEURT scores.

pdf bib abs

Innovators at SemEval-2024 Task 10: Revolutionizing Emotion Recognition and Flip Analysis in Code-Mixed Texts
Abhay Shanbhag | Suramya Jadhav | Shashank Rathi | Siddhesh Pande | Dipali Kadam
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper, we introduce our system for all three tracks of the SemEval 2024 EDiReF Shared Task 10, which focuses on Emotion Recognition in Conversation (ERC) and Emotion Flip Reasoning (EFR) within the domain of conversational analysis. Task-Track 1 (ERC) aims to assign an emotion to each utterance in the Hinglish language from a predefined set of possible emotions. Tracks 2 (EFR) and 3 (EFR) aim to identify the trigger utterance(s) for an emotion flip in a multi-party conversation dialogue in Hinglish and English text, respectively. For Track 1, our study spans both traditional machine learning ensemble techniques, including Decision Trees, SVM, Logistic Regression, and Multinomial NB models, as well as advanced transformer-based models like XLM-Roberta (XLMR), DistilRoberta, and T5 from Hugging Face’s transformer library. In the EFR competition, we developed and proposed two innovative algorithms to tackle the challenges presented in Tracks 2 and 3. Specifically, our team, Innovators, developed a standout algorithm that propelled us to secure the 2nd rank in Track 2, achieving an impressive F1 score of 0.79, and the 7th rank in Track 3, with an F1 score of 0.68.

Co-authors

Ridhima Bhaskar Sinare 1

Sheetal Sonawane 1

Venues

SemEval1

Fix author