Raksha Sharma

2025

Identifying Aggression and Offensive Language in Code-Mixed Tweets: A Multi-Task Transfer Learning Approach
Bharath Kancharla | Prabhjot Singh | Lohith Bhagavan Kancharla | Yashita Chama | Raksha Sharma
Proceedings of the First Workshop on Natural Language Processing for Indo-Aryan and Dravidian Languages

The widespread use of social media has contributed to the increase in hate speech and offensive language, impacting people of all ages. This issue is particularly difficult to address when the text is in a code-mixed language. Twitter is commonly used to express opinions in code-mixed language. In this paper, we introduce a novel Multi-Task Transfer Learning (MTTL) framework to detect aggression and offensive language. By focusing on the dual facets of cyberbullying, aggressiveness and offensiveness, our model leverages the MTTL approach to enhance the performance of the model on the aggression and offensive language detection. Results show that our Multi-Task Transfer Learning (MTTL) setup significantly enhances the performance of state-of-the-art pretrained language models, BERT, RoBERTa, and Hing-RoBERTa for Hindi-English code-mixed data from Twitter.

pdf bib abs

Power doesn’t reside in size: A Low Parameter Hybrid Language Model (HLM) for Sentiment Analysis in Code-mixed data
Pavan Sai Balaga | Nagasamudram Karthik | Challa Vishwanath | Raksha Sharma | Rudra Murthy | Ashish Mittal
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Code-mixed text—where multiple languages are used within the same utterance—is increasingly common in both spoken and written communication. However, it presents significant challenges for machine learning models due to the interplay of distinct grammatical structures, effectively forming a hybrid language. While fine-tuning large language models (LLMs) such as GPT-3, or Llama-3 on code-mixed data has led to performance improvements, these models still lag behind their monolingual counterparts and incur high computational costs due to the large number of trainable parameters.In this paper, we focus on the task of sentiment detection in code-mixed text and propose a Hybrid Language Model (HLM) that combines a multilingual encoder (e.g., mBERT) with a lightweight decoder (e.g., Sarvam-1) (3B parameters). Despite having significantly fewer trainable parameters, HLM achieves sentiment classification performance comparable to that of fine-tuned Large Language Models (LLMs) (> 7B parameters). Furthermore, our results demonstrate that HLM significantly outperforms models trained individually, underscoring its effectiveness for low-resource, code-mixed sentiment analysis.

2023

pdf bib abs

Late Fusion of Transformers for Sentiment Analysis of Code-Switched Data
Gagan Sharma | R Chinmay | Raksha Sharma
Findings of the Association for Computational Linguistics: EMNLP 2023

Code-switching is a common phenomenon in multilingual communities and is often used on social media. However, sentiment analysis of code-switched data is a challenging yet less explored area of research. This paper aims to develop a sentiment analysis system for code-switched data. In this paper, we present a novel approach combining two transformers using logits of their output and feeding them to a neural network for classification. We show the efficacy of our approach using two benchmark datasets, viz., English-Hindi (En-Hi), and English-Spanish (En-Es) availed by Microsoft GLUECoS. Our approach results in an F1 score of 73.66% for En-Es and 61.24% for En-Hi, significantly higher than the best model reported for the GLUECoS benchmark dataset.

pdf bib abs

IITR at BioLaySumm Task 1:Lay Summarization of BioMedical articles using Transformers
Venkat praneeth Reddy | Pinnapu Reddy Harshavardhan Reddy | Karanam Sai Sumedh | Raksha Sharma
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

Initially, we analyzed the datasets in a statistical way so as to learn about various sections’ contributions to the final summary in both the pros and life datasets. We found that both the datasets have an Introduction and Abstract along with some initial parts of the results contributing to the summary. We considered only these sections in the next stage of analysis. We found the optimal length or no of sentences of each of the Introduction, abstract, and result which contributes best to the summary. After this statistical analysis, we took the pre-trained model Facebook/bart-base and fine-tuned it with both the datasets PLOS and eLife. While fine-tuning and testing the results we have used chunking because the text lengths are huge. So to not lose information due to the number of token constraints of the model, we used chunking. Finally, we saw the eLife model giving more accurate results than PLOS in terms of readability aspect, probably because the PLOS summary is closer to its abstract, we have considered the eLife model as our final model and tuned the hyperparameters. We are ranked 7th overall and 1st in readability

2022

pdf bib abs

Leveraging Dependency Grammar for Fine-Grained Offensive Language Detection using Graph Convolutional Networks
Divyam Goel | Raksha Sharma
Proceedings of the Tenth International Workshop on Natural Language Processing for Social Media

The last few years have witnessed an exponential rise in the propagation of offensive text on social media. Identification of this text with high precision is crucial for the well-being of society. Most of the existing approaches tend to give high toxicity scores to innocuous statements (e.g., “I am a gay man”). These false positives result from over-generalization on the training data where specific terms in the statement may have been used in a pejorative sense (e.g., “gay”). Emphasis on such words alone can lead to discrimination against the classes these systems are designed to protect. In this paper, we address the problem of offensive language detection on Twitter, while also detecting the type and the target of the offense. We propose a novel approach called SyLSTM, which integrates syntactic features in the form of the dependency parse tree of a sentence and semantic features in the form of word embeddings into a deep learning architecture using a Graph Convolutional Network. Results show that the proposed approach significantly outperforms the state-of-the-art BERT model with orders of magnitude fewer number of parameters.

pdf bib abs

Transformer-based Architecture for Empathy Prediction and Emotion Classification
Himil Vasava | Pramegh Uikey | Gaurav Wasnik | Raksha Sharma
Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis

This paper describes the contribution of team PHG to the WASSA 2022 shared task on Empathy Prediction and Emotion Classification. The broad goal of this task was to model an empathy score, a distress score and the type of emotion associated with the person who had reacted to the essay written in response to a newspaper article. We have used the RoBERTa model for training and top of which few layers are added to finetune the transformer. We also use few machine learning techniques to augment as well as upsample the data. Our system achieves a Pearson Correlation Coefficient of 0.488 on Task 1 (Empathy - 0.470 and Distress - 0.506) and Macro F1-score of 0.531 on Task 2.

pdf bib abs

IITR CodeBusters at SemEval-2022 Task 5: Misogyny Identification using Transformers
Gagan Sharma | Gajanan Sunil Gitte | Shlok Goyal | Raksha Sharma
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper presents our submission to task 5 ( Multimedia Automatic Misogyny Identification) of the SemEval 2022 competition. The purpose of the task is to identify given memes as misogynistic or not and further label the type of misogyny involved. In this paper, we present our approach based on language processing tools. We embed meme texts using GloVe embedding and classify misogyny using BERT model. Our model obtains an F1-score of 66.24% and 63.5% in misogyny classification and misogyny labels, respectively.

2021

pdf bib abs

NLPIITR at SemEval-2021 Task 6: RoBERTa Model with Data Augmentation for Persuasion Techniques Detection
Vansh Gupta | Raksha Sharma
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes and examines different systems to address Task 6 of SemEval-2021: Detection of Persuasion Techniques In Texts And Images, Subtask 1. The task aims to build a model for identifying rhetorical and psycho- logical techniques (such as causal oversimplification, name-calling, smear) in the textual content of a meme which is often used in a disinformation campaign to influence the users. The paper provides an extensive comparison among various machine learning systems as a solution to the task. We elaborate on the pre-processing of the text data in favor of the task and present ways to overcome the class imbalance. The results show that fine-tuning a RoBERTa model gave the best results with an F1-Micro score of 0.51 on the development set.

pdf bib abs

Team_KGP at SemEval-2021 Task 7: A Deep Neural System to Detect Humor and Offense with Their Ratings in the Text Data
Anik Mondal | Raksha Sharma
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes the system submitted to SemEval-2021 Task-7 for all four subtasks. Two subtasks focus on detecting humor and offense from the text (binary classification). On the other hand, the other two subtasks predict humor and offense ratings of the text (linear regression). In this paper, we present two different types of fine-tuning methods by using linear layers and bi-LSTM layers on top of the pre-trained BERT model. Results show that our system is able to outperform baseline models by a significant margin. We report F1 scores of 0.90 for the first subtask and 0.53 for the third subtask, while we report an RMSE of 0.57 and 0.58 for the second and fourth subtasks, respectively.

2018

pdf bib abs

Identifying Transferable Information Across Domains for Cross-domain Sentiment Classification
Raksha Sharma | Pushpak Bhattacharyya | Sandipan Dandapat | Himanshu Sharad Bhatt
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Getting manually labeled data in each domain is always an expensive and a time consuming task. Cross-domain sentiment analysis has emerged as a demanding concept where a labeled source domain facilitates a sentiment classifier for an unlabeled target domain. However, polarity orientation (positive or negative) and the significance of a word to express an opinion often differ from one domain to another domain. Owing to these differences, cross-domain sentiment classification is still a challenging task. In this paper, we propose that words that do not change their polarity and significance represent the transferable (usable) information across domains for cross-domain sentiment classification. We present a novel approach based on χ2 test and cosine-similarity between context vector of words to identify polarity preserving significant words across domains. Furthermore, we show that a weighted ensemble of the classifiers enhances the cross-domain classification performance.

2017

pdf bib abs

Sentiment Intensity Ranking among Adjectives Using Sentiment Bearing Word Embeddings
Raksha Sharma | Arpan Somani | Lakshya Kumar | Pushpak Bhattacharyya
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Identification of intensity ordering among polar (positive or negative) words which have the same semantics can lead to a fine-grained sentiment analysis. For example, ‘master’, ‘seasoned’ and ‘familiar’ point to different intensity levels, though they all convey the same meaning (semantics), i.e., expertise: having a good knowledge of. In this paper, we propose a semi-supervised technique that uses sentiment bearing word embeddings to produce a continuous ranking among adjectives that share common semantics. Our system demonstrates a strong Spearman’s rank correlation of 0.83 with the gold standard ranking. We show that sentiment bearing word embeddings facilitate a more accurate intensity ranking system than other standard word embeddings (word2vec and GloVe). Word2vec is the state-of-the-art for intensity ordering task.

2016

pdf bib

Meaning Matters: Senses of Words are More Informative than Words for Cross-domain Sentiment Analysis
Raksha Sharma | Sudha Bhingardive | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing

pdf bib abs

High, Medium or Low? Detecting Intensity Variation Among polar synonyms in WordNet
Raksha Sharma | Pushpak Bhattacharyya
Proceedings of the 8th Global WordNet Conference (GWC)

For fine-grained sentiment analysis, we need to go beyond zero-one polarity and find a way to compare adjectives (synonyms) that share the same sense. Choice of a word from a set of synonyms, provides a way to select the exact polarity-intensity. For example, choosing to describe a person as benevolent rather than kind1 changes the intensity of the expression. In this paper, we present a sense based lexical resource, where synonyms are assigned intensity levels, viz., high, medium and low. We show that the measure P (s|w) (probability of a sense s given the word w) can derive the intensity of a word within the sense. We observe a statistically significant positive correlation between P(s|w) and intensity of synonyms for three languages, viz., English, Marathi and Hindi. The average correlation scores are 0.47 for English, 0.56 for Marathi and 0.58 for Hindi.

Raksha Sharma

2025

2023

2022

2021

2018

2017

2016

2015

2014

2013

Co-authors

Venues