2025
pdf
bib
abs
Code_Conquerors@DravidianLangTech 2025: Multimodal Misogyny Detection in Dravidian Languages Using Vision Transformer and BERT
Pathange Omkareshwara Rao
|
Harish Vijay V
|
Ippatapu Venkata Srichandra
|
Neethu Mohan
|
Sachin Kumar S
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This research focuses on misogyny detection in Dravidian languages using multimodal techniques. It leverages advanced machine learning models, including Vision Transformers (ViT) for image analysis and BERT-based transformers for text processing. The study highlights the challenges of working with regional datasets and addresses these with innovative preprocessing and model training strategies. The evaluation reveals significant improvements in detection accuracy, showcasing the potential of multimodal approaches in combating online abuse in underrepresented languages.
pdf
bib
abs
Cyber Protectors@DravidianLangTech 2025: Abusive Tamil and Malayalam Text Targeting Women on Social Media using FastText
Rohit Vp
|
Madhav M
|
Ippatapu Venkata Srichandra
|
Neethu Mohan
|
Sachin Kumar S
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Social media has transformed communication, but it has opened new ways for women to be abused. Because of complex morphology, large vocabulary, and frequent code-mixing of Tamil and Malayalam, it might be especially challenging to identify discriminatory text in linguistically diverse settings. Because traditional moderation systems frequently miss these linguistic subtleties, gendered abuse in many forms—from outright threats to character insults and body shaming—continues. In addition to examining the sociocultural characteristics of this type of harassment on social media, this study compares the effectiveness of several Natural Language Processing (NLP) models, such as FastText, transformer-based architectures, and BiLSTM. Our results show that FastText achieved an macro f1 score of 0.74 on the Tamil dataset and 0.64 on the Malayalam dataset, outperforming the Transformer model which achieved a macro f1 score of 0.62 and BiLSTM achieved 0.57. By addressing the limitations of existing moderation techniques, this research underscores the urgent need for language-specific AI solutions to foster safer digital spaces for women.
pdf
bib
abs
Wictory@DravidianLangTech 2025: Political Sentiment Analysis of Tamil X(Twitter) Comments using LaBSE and SVM
Nithish Ariyha K
|
Eshwanth Karti T R
|
Yeshwanth Balaji A P
|
Vikash J
|
Sachin Kumar S
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Political sentiment analysis has become an essential area of research in Natural Language Processing (NLP), driven by the rapid rise ofsocial media as a key platform for political discourse. This study focuses on sentiment classification in Tamil political tweets, addressing the linguistic and cultural complexities inherent in low-resource languages. To overcome data scarcity challenges, we develop a system that integrates embeddings with advanced Machine Learning techniques, ensuring effective sentiment categorization. Our approach leverages deep learning-based models and transformer architectures to capture nuanced expressions, contributing to improved sentiment classification. This work enhances NLP methodologies for low-resource languages and provides valuable insights into Tamil political discussions, aiding policymakers and researchers in understanding public sentiment more accurately. Notably, our system secured Rank 5in the NAACL shared task, demonstrating its effectiveness in real-world sentiment classification challenges.
pdf
bib
abs
ANSR@DravidianLangTech 2025: Detection of Abusive Tamil and Malayalam Text Targeting Women on Social Media using RoBERTa and XGBoost
Nishanth S
|
Shruthi Rengarajan
|
S Ananthasivan
|
Burugu Rahul
|
Sachin Kumar S
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Abusive language directed at women on social media, often characterized by crude slang, offensive terms, and profanity, is not just harmful communication but also acts as a tool for serious and widespread cyber violence. It is imperative that this pressing issue be addressed in order to establish safer online spaces and provide efficient methods for detecting and minimising this kind of abuse. However, the intentional masking of abusive language, especially in regional languages like Tamil and Malayalam, presents significant obstacles, making detection and prevention more difficult. The system created effectively identifies abusive sentences using supervised machine learning techniques based on RoBerta embeddings. The method aims to improve upon the current abusive language detection systems, which are essential for various online platforms, including social media and online gaming services. The proposed method currently ranked 8 in malayalam and 20 in tamil in terms of f1 score.
pdf
bib
abs
Synapse@DravidianLangTech 2025: Multiclass Political Sentiment Analysis in Tamil X (Twitter) Comments: Leveraging Feature Fusion of IndicBERTv2 and Lexical Representations
Suriya Kp
|
Durai Singh K
|
Vishal A S
|
Kishor S
|
Sachin Kumar S
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Social media platforms like X (twitter) have gained popularity for political debates and election campaigns in the last decade. This creates the need to moderate and understand the sentiments of the tweets in order to understand the state of digital campaigns. This paper focuses on political sentiment classification of Tamil X (Twitter) comments which proves to be challenging because of the presence of informal expressions, code-switching, and limited annotated datasets. This study focuses on categorizing them into seven classes: substantiated, sarcastic, opinionated, positive, negative, neutral, and none of the above. This paper proposes a solution to Political Multiclass Sentiment Analysis of Tamil X (Twitter) Comments - DravidianLangTech@NAACL 2025 shared task, the solution incorporates IndicBERTv2-MLM-Back-Translation model and TF-IDF vectors into a custom model. Further we explore the use of preprocessing techniques to enrich hashtags and emojis with their context. Our approach achieved Rank 1 with a macro F1 average of 0.38 in the shared task.
2024
pdf
bib
abs
Exploring Kolmogorov Arnold Networks for Interpretable Mental Health Detection and Classification from Social Media Text
Ajay Surya Jampana
|
Mohitha Velagapudi
|
Neethu Mohan
|
Sachin Kumar S
Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Mental health analysis from social media text demands both high accuracy and interpretability for responsible healthcare applications. This paper explores Kolmogorov Arnold Networks (KANs) for mental health detection and classification, demonstrating their superior performance compared to Multi-Layer Perceptrons (MLPs) in accuracy while requiring fewer parameters. To further enhance interpretability, we leverage the Local Interpretable Model Agnostic Explanations (LIME) method to identify key features, resulting in a simplified KAN model. This allows us to derive governing equations for each class, providing a deeper understanding of the relationships between texts and mental health conditions.
2023
pdf
bib
abs
Improving Reinfocement Learning Agent Training using Text based Guidance: A study using Commands in Dravidian Languages
Nikhil Chowdary Paleti
|
Sai Aravind Vadlapudi
|
Sai Aashish Menta
|
Sai Akshay Menta
|
Vishnu Vardhan Gorantla V N S L
|
Janakiram Chandu
|
Soman K P
|
Sachin Kumar S
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Reinforcement learning (RL) agents have achieved remarkable success in various domains, such as game-playing and protein structure prediction. However, most RL agents rely on exploration to find optimal solutions without explicit guidance. This paper proposes a methodology for training RL agents using text-based instructions in Dravidian Languages, including Telugu, Tamil, and Malayalam along with using the English language. The agents are trained in a modified Lunar Lander environment, where they must follow specific paths to successfully land the lander. The methodology involves collecting a dataset of human demonstrations and textual instructions, encoding the instructions into numerical representations using text-based embeddings, and training RL agents using state-of-the-art algorithms. The results demonstrate that the trained Soft Actor-Critic (SAC) agent can effectively understand and generalize instructions in different languages, outperforming other RL algorithms such as Proximal Policy Optimization (PPO) and Deep Deterministic Policy Gradient (DDPG).
2017
pdf
bib
abs
deepCybErNet at EmoInt-2017: Deep Emotion Intensities in Tweets
Vinayakumar R
|
Premjith B
|
Sachin Kumar S
|
Soman KP
|
Prabaharan Poornachandran
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
This working note presents the methodology used in deepCybErNet submission to the shared task on Emotion Intensities in Tweets (EmoInt) WASSA-2017. The goal of the task is to predict a real valued score in the range [0-1] for a particular tweet with an emotion type. To do this, we used Bag-of-Words and embedding based on recurrent network architecture. We have developed two systems and experiments are conducted on the Emotion Intensity shared Task 1 data base at WASSA-2017. A system which uses word embedding based on recurrent network architecture has achieved highest 5 fold cross-validation accuracy. This has used embedding with recurrent network to extract optimal features at tweet level and logistic regression for prediction. These methods are highly language independent and experimental results shows that the proposed methods are apt for predicting a real valued score in than range [0-1] for a given tweet with its emotion type.