Manavi K


2024

pdf bib
MUCS@DravidianLangTech-2024: Role of Learning Approaches in Strengthening Hate-Alert Systems for code-mixed text
Manavi K | Sonali K | Gauthamraj K | Kavya G | Asha Hegde | Hosahalli Shashirekha
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Hate and offensive language detection is the task of detecting hate and/or offensive content targetting a person or a group of people. Despite many efforts to detect hate and offensive content on social media platforms, the problem remains unsolved till date due to the ever growing social media users and their creativity to create and spread hate and offensive content. To address the automatic detection of hate and offensive content on social media platforms, this paper describes the learning models submitted by our team - MUCS to “Hate and Offensive Language Detection in Telugu Codemixed Text (HOLD-Telugu): DravidianLangTech@EACL” - a shared task organized at European Chapter of the Association for Computational Linguistics (EACL) 2024 invites the research community to address the challenges of detecting hate and offensive language in Telugu language. In this paper, we - team MUCS, describe the learning models submitted to the above mentioned shared task. Three models: Three models: i) LR model - a Machine Learning (ML) algorithm fed with TF-IDF of n-grams of subword, word and char_wb are in the range (1, 3), (1, 3), and (1, 5), ii) TL- a pretrained BERT models which makes use of Hate-speech-CNERG/bert-base-uncased-hatexplain model and iii) Ensemble model which is the combination of ML classifieres( MNB, LR, GNB) trained CountVectorizer with word and char ngrams of range (1, 3) and (1, 5) respectively. Proposed LR model trained with TF-IDF of subword, word and char n-grams outperformed the other models with macro F1 scores of 0.6501 securing 15th rankin the shared task for Telugu text.

pdf bib
MUCS@DravidianLangTech-2024: A Grid Search Approach to Explore Sentiment Analysis in Code-mixed Tamil and Tulu
Prathvi B | Manavi K | Subrahmanyapoojary K | Asha Hegde | Kavya G | Hosahalli Shashirekha
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Sentiment Analysis (SA) is a field of computational study that analyzes and understands people’s opinions, attitudes, and emotions toward any entity. A review of an entity can be written about an individual, an event, a topic, a product, etc., and such reviews are abundant on social media platforms. The increasing number of social media users and the growing amount of user-generated code-mixed content such as reviews, comments, posts etc., on social media have resulted in a rising demand for efficient tools capable of effectively analyzing such content to detect the sentiments. In spite of this, SA of social media text is challenging because the code-mixed text is complex. To address SA in code-mixed Tamil and Tulu text, this paper describes the Machine Learning (ML) models submitted by our team - MUCS to “Sentiment Analysis in Tamil and Tulu - Dravidian- LangTech” - a shared task organized at European Chapter of the Association for Computational Linguistics (EACL) 2024. Linear Support Vector classifier (LinearSVC) and ensemble of 5 ML classifiers (k Nearest Neighbour (kNN), Stochastic Gradient Descent (SGD), Logistic Regression (LR), LinearSVC, and Random Forest Classifier (RFC)) with hard voting trained using concatenated features obtained from word and character n-ngrams vectoized from Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer and CountVectorizer. Further, Gridsearch algorithm is employed to obtain optimal hyperparameter values.The proposed ensemble model obtained macro F1 scores of 0.260 and 0.550 for Tamil and Tulu languages respectively.