Dr. Soman K P


2023

pdf bib
Social Media Data Analysis for Malayalam YouTube Comments: Sentiment Analysis and Emotion Detection using ML and DL Models
Abeera V P | Dr. Sachin Kumar | Dr. Soman K P
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages

In this paper, we present a study on social media data analysis of Malayalam YouTube comments, specifically focusing on sentiment analysis and emotion detection. Our research aims to investigate the effectiveness of various machine learning (ML) and deep learning (DL) models in addressing these two tasks. For sentiment analysis, we collected a dataset consisting of 3064 comments, while for two-class emotion detection, we used a dataset of 817 comments. In the sentiment analysis phase, we explored multiple ML and DL models, including traditional algorithms such as Support Vector Machines (SVM), Naïve Bayes, K-Nearest Neighbors (KNN), MLP Classifier, Decision Tree, and Random Forests. Additionally, we utilized DL models such as Recurrent Neural Networks (RNN), LSTM, and GRU. To enhance the performance of these models, we preprocessed the Malayalam YouTube comments by tokenizing and removing stop words. Experimental results revealed that DL models achieved higher accuracy compared to ML models, indicating their ability to capture the complex patterns and nuances in the Malayalam language. Furthermore, we extended our analysis to emotion detection, which involved dealing with limited annotated data. This task is closely related to social media data analysis. For emotion detection, we employed the same ML models used in the sentiment analysis phase. Our dataset of 817 comments was annotated with two emotions: Happy and Sad. We trained the models to classify the comments into these emotion classes and analyzed the accuracy of the different models.