Prathvi B
2024
MUCS@DravidianLangTech-2024: A Grid Search Approach to Explore Sentiment Analysis in Code-mixed Tamil and Tulu
Prathvi B
|
Manavi K
|
Subrahmanyapoojary K
|
Asha Hegde
|
Kavya G
|
Hosahalli Shashirekha
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Sentiment Analysis (SA) is a field of computational study that analyzes and understands people’s opinions, attitudes, and emotions toward any entity. A review of an entity can be written about an individual, an event, a topic, a product, etc., and such reviews are abundant on social media platforms. The increasing number of social media users and the growing amount of user-generated code-mixed content such as reviews, comments, posts etc., on social media have resulted in a rising demand for efficient tools capable of effectively analyzing such content to detect the sentiments. In spite of this, SA of social media text is challenging because the code-mixed text is complex. To address SA in code-mixed Tamil and Tulu text, this paper describes the Machine Learning (ML) models submitted by our team - MUCS to “Sentiment Analysis in Tamil and Tulu - Dravidian- LangTech” - a shared task organized at European Chapter of the Association for Computational Linguistics (EACL) 2024. Linear Support Vector classifier (LinearSVC) and ensemble of 5 ML classifiers (k Nearest Neighbour (kNN), Stochastic Gradient Descent (SGD), Logistic Regression (LR), LinearSVC, and Random Forest Classifier (RFC)) with hard voting trained using concatenated features obtained from word and character n-ngrams vectoized from Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer and CountVectorizer. Further, Gridsearch algorithm is employed to obtain optimal hyperparameter values.The proposed ensemble model obtained macro F1 scores of 0.260 and 0.550 for Tamil and Tulu languages respectively.
Search