Sonali


2024

pdf bib
MUCS@DravidianLangTech-2024: Role of Learning Approaches in Strengthening Hate-Alert Systems for code-mixed text
Manavi K K | Sonali | Gauthamraj | Kavya G | Asha Hegde | H L Shashirekha
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Hate and offensive language detection is the task of detecting hate and/or offensive content targetting a person or a group of people. Despite many efforts to detect hate and offensive content on social media platforms, the problem remains unsolved till date due to the ever growing social media users and their creativity to create and spread hate and offensive content. To address the automatic detection of hate and offensive content on social media platforms, this paper describes the learning models submitted by our team - MUCS to “Hate and Offensive Language Detection in Telugu Codemixed Text (HOLD-Telugu): DravidianLangTech@EACL” - a shared task organized at European Chapter of the Association for Computational Linguistics (EACL) 2024 invites the research community to address the challenges of detecting hate and offensive language in Telugu language. In this paper, we - team MUCS, describe the learning models submitted to the above mentioned shared task. Three models: Three models: i) LR model - a Machine Learning (ML) algorithm fed with TF-IDF of n-grams of subword, word and char_wb are in the range (1, 3), (1, 3), and (1, 5), ii) TL- a pretrained BERT models which makes use of Hate-speech-CNERG/bert-base-uncased-hatexplain model and iii) Ensemble model which is the combination of ML classifieres( MNB, LR, GNB) trained CountVectorizer with word and char ngrams of range (1, 3) and (1, 5) respectively. Proposed LR model trained with TF-IDF of subword, word and char n-grams outperformed the other models with macro F1 scores of 0.6501 securing 15th rankin the shared task for Telugu text.

pdf bib
MUCS@LT-EDI-2024: Learning Approaches to Empower Homophobic/Transphobic Comment Identification
Sonali | Nethravathi Gidnakanala | Raksha G | Kavya G | Asha Hegde | H L Shashirekha
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

Homophobic/Transphobic (H/T) content includes hatred and discriminatory comments directed at Lesbian, Gay, Bisexual, Transgender, Queer (LGBTQ) individuals on social media platforms. As this unfavourable perception towards LGBTQ individuals may affect them physically and mentally, it is necessary to detect H/T content on social media. This demands automated tools to identify and address H/T content. In view of this, in this paper, we - team MUCS describe the learning models submitted to “Homophobia/Transphobia Detection in social media comments:LT-EDI@EACL 2024” shared task at European Chapter of the Association for Computational Linguistics (EACL) 2024. The learning models: i) Homo_Ensemble - an ensemble of Machine Learning (ML) algorithms trained with Term Frequency-Inverse Document Frequency (TFIDF) of syllable n-grams in the range (1, 3), ii) Homo_TL - a model based on Transfer Learning (TL) approach with Bidirectional Encoder Representations from Transformers (BERT) models, iii) Homo_probfuse - an ensemble of ML classifiers with soft voting trained using sentence embeddings (except for Hindi), and iv) Homo_FSL - Few-Shot Learning (FSL) models using Sentence Transformer (ST) (only for Tulu), are proposed to detect H/T content in the given languages. Among the models submitted to the shared task, the models that performed better for each language include: i) Homo_Ensemble model obtained macro F1 score of 0.95 securing 4th rank for Telugu language, ii) Homo_TL model obtained macro F1 scores of 0.49, 0.53, 0.45, 0.94, and 0.95 securing 2nd, 2nd, 1st, 1st, and 4th ranks for English, Marathi, Hindi, Kannada, and Gujarathi languages, respectively, iii) Homo_probfuse model obtained macro F1 scores of 0.86, 0.87, and 0.53 securing 2nd, 6th, and 2nd ranks for Tamil, Malayalam, and Spanish languages respectively, and iv) Homo_FSL model obtained a macro F1 score of 0.62 securing 2nd rank for Tulu dataset.