Aruna A

2025

pdf bib abs
KEC-Elite-Analysts@DravidianLangTech 2025: Deciphering Emotions in Tamil-English and Code-Mixed Social Media Tweets
Malliga Subramanian | Aruna A | Anbarasan T | Amudhavan M | Jahaganapathi S | Kogilavani Shanmugavadivel
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Sentiment analysis in code-mixed languages, particularly Tamil-English, is a growing challenge in natural language processing (NLP) due to the prevalence of multilingual communities on social media. This paper explores various machine learning and transformer-based models, including Logistic Regression, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), BERT, and mBERT, for sentiment classification of Tamil-English code-mixed text. The models are evaluated on a shared task dataset provided by DravidianLangTech@NAACL 2025, with performance measured through accuracy, precision, recall, and F1-score. Our results demonstrate that transformer-based models, particularly mBERT, outperform traditional classifiers in identifying sentiment polarity. Future work aims to address the challenges posed by code-switching and class imbalance through advanced model architectures and data augmentation techniques.

pdf bib abs
KEC-Elite-Analysts@LT-EDI 2025: Leveraging Deep Learning for Racial Hoax Detection in Code-Mixed Hindi-English Tweets
Malliga Subramanian | Aruna A | Amudhavan M | Jahaganapathi S | Kogilavani Shanmugavadivel
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

Detecting misinformation in code-mixed languages, particularly Hindi-English, poses significant challenges in natural language processing due to the linguistic diversity found on social media. This paper focuses on racial hoax detection—false narratives that target specific communities—within Hindi-English YouTube comments. We evaluate the effectiveness of several machine learning models, including Logistic Regression, Random Forest, Support Vector Machine, Naive Bayes, and Multi-Layer Perceptron, using a dataset of 5,105 annotated comments. Model performance is assessed using accuracy, precision, recall, and F1-score. Experimental results indicate that neural and ensemble models consistently outperform traditional classifiers. Future work will explore the use of transformer-based architectures and data augmentation techniques to enhance detection in low-resource, code-mixed scenarios.

Co-authors

Venues

Fix author