Maria Nancy C


2025

pdf bib
SSN_IT_NLP@DravidianLangTech 2025: Abusive Tamil and Malayalam Text targeting Women on Social Media
Maria Nancy C | Radha N | Swathika R
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The proliferation of social media platforms has resulted in increased instances of online abuse, particularly targeting marginalized groups such as women. This study focuses on the classification of abusive comments in Tamil and Malayalam, two Dravidian languages widely spoken in South India. Leveraging a multilingual BERT model, this paper provides an effective approach for detecting and categorizing abusive and non-abusive text. Using labeled datasets comprising social media comments, our model demonstrates its ability to identify targeted abuse with promising accuracy. This paper outlines the dataset preparation, model architecture, training methodology, and the evaluation of results, providing a foundation for combating online abuse in low-resource languages.The methodology is unique for its integration of multilingual BERT and weighted loss functions to address class imbalance, showcasing a pathway for effective abuse detection in other underrepresented languages. The BERT model achieved an F1-score of 0.6519 for Tamil and 0.6601 for Malayalam. The codefor this work is available on github Abusive-Text-targeting-women

pdf bib
SSN_IT_HATE@LT-EDI-2025: Caste and Migration Hate Speech Detection
Maria Nancy C | Radha N | Swathika R
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

This paper proposes a transformer-based methodology for detecting hate speech in Tamil, developed as part of the shared task on Caste and Migration Hate Speech Detection. Leveraging the multilingual BERT (mBERT) model, we fine-tune it to classify Tamil social media content into caste/migration-related hate speech and non hate speech categories. Our approach achieves a macro F1-score of 0.72462 in the development dataset, demonstrating the effectiveness of multilingual pretrained models in low-resource language settings. The code for this work is available on github Hate-Speech Deduction.