Vishal S


2025

pdf bib
ItsAllGoodMan@LT-EDI-2025: Fusing TF-IDF and MuRIL Embeddings for Detecting Caste and Migration Hate Speech
Amritha Nandini K L | Vishal S | Giri Prasath R | Anerud Thiyagarajan | Sachin Kumar S
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

Caste and migration hate speech detection is a critical task in the context of increasingly multilingual and diverse online discourse. In this work, we address the problem of identifying hate speech targeting caste and migrant communities across a multilingual social media dataset containing Tamil, Tamil written in English script, and English. We explore and compare different feature representations, including TF-IDF vectors and embeddings from pretrained transformer-based models, to train various machine learning classifiers. Our experiments show that a Soft Voting Classifier that make use of both TF-IDF vectors and MuRIL embeddings performs best, achieving a macro F1 score of 0.802 on the test set. This approach was evaluated as part of the Shared Task on Caste and Migration Hate Speech Detection at LT-EDI@LDK 2025, where it ranked 6th overall.

pdf bib
TechSSN3 at SemEval-2025 Task 11: Multi-Label Emotion Detection Using Ensemble Transformer Models and Lexical Rules
Vishal S | Rajalakshmi Sivanaiah | Angel Deborah S
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Transformer models, specifically BERT-Large Uncased, DeBERTa, and RoBERTa, are first employed to classify the dataset, with their hyperparameters being fine-tuned to identify the most effective configuration. These models leverage deep contextual embeddings to capture nuanced semantic and syntactic information, making them powerful for sentiment analysis. However, transformer-based models alone may not fully capture the structural aspects of sentiment-bearing sentences.To address this, part-of-speech (POS) tagging is incorporated using a Hidden Markov Model (HMM) to analyze sentence structure and identify the key words responsible for conveying sentiment. By isolating adjectives, adverbs, and verbs, the lexical sentiment of individual words is determined using a polarity-based scoring method. This lexical score, derived from sentiment lexicons like SentiWordNet, provides an additional layer of interpretability, particularly in cases where transformer models struggle with implicit sentiment cues or negation handling.A key innovation in this approach is the adaptive weighting mechanism used to combine the outputs of the transformer models and lexical scoring. Instead of assigning uniform importance to each method, a unique weight is assigned to each model for every emotion category, ensuring that the best-performing approach contributes more significantly to the final sentiment prediction. For instance, DeBERTa, which excels in contextual understanding, is given more weight for subtle emotions like sadness, whereas lexical scoring is emphasized for emotions heavily influenced by explicit adjectives, such as joy or anger. The weight allocation is determined empirically through performance evaluation on a validation set, ensuring an optimal balance between deep learning-based contextual understanding and rule-based sentiment assessment.Additionally, traditional machine learning models such as Support Vector Machines (SVMs), Decision Trees, and Random Forests are tested for comparative analysis. However, these models demonstrate inferior performance, struggling with capturing deep contextual semantics and handling nuanced expressions of sentiment, reinforcing the superiority of the hybrid transformer + lexical approach.This method not only enhances interpretability but also improves accuracy, particularly in cases where sentiment is influenced by structural elements, negations, or compound expressions. The combined framework ensures a more robust and adaptable sentiment analysis model, effectively balancing data-driven learning and linguistic insights.