Md Osama

2025

pdf bib abs
CUET_INSights@NLU of Devanagari Script Languages 2025: Leveraging Transformer-based Models for Target Identification in Hate Speech
Farjana Alam Tofa | Lorin Tasnim Zeba | Md Osama | Ashim Dey
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)

Hate speech detection in multilingual content is a challenging problem especially when it comes to understanding the specific targets of hateful expressions. Identifying the targets of hate speech whether directed at individuals, organizations or communities is crucial for effective content moderation and understanding the context. A shared task on hate speech detection in Devanagari Script Languages organized by CHIPSAL@COLING 2025 allowed us to address the challenge of identifying the target of hate speech in the Devanagari Script Language. For this task, we experimented with various machine learning (ML) and deep learning (DL) models including Logistic Regression, Decision Trees, Random Forest, SVM, CNN, LSTM, BiLSTM, and transformer-based models like MiniLM, m-BERT, and Indic-BERT. Our experiments demonstrated that Indic-BERT achieved the highest F1-score of 0.69, ranked 3rd in the shared task. This research contributes to advancing the field of hate speech detection and natural language processing in low-resource languages.

2024

pdf bib abs
CUET_NLP_GoodFellows@DravidianLangTech EACL2024: A Transformer-Based Approach for Detecting Fake News in Dravidian Languages
Md Osama | Kawsar Ahmed | Hasan Mesbaul Ali Taher | Jawad Hossain | Shawly Ahsan | Mohammed Moshiul Hoque
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

In this modern era, many people have been using Facebook and Twitter, leading to increased information sharing and communication. However, a considerable amount of information on these platforms is misleading or intentionally crafted to deceive users, which is often termed as fake news. A shared task on fake news detection in Malayalam organized by DravidianLangTech@EACL 2024 allowed us for addressing the challenge of distinguishing between original and fake news content in the Malayalam language. Our approach involves creating an intelligent framework to categorize text as either fake or original. We experimented with various machine learning models, including Logistic Regression, Decision Tree, Random Forest, Multinomial Naive Bayes, SVM, and SGD, and various deep learning models, including CNN, BiLSTM, and BiLSTM + Attention. We also explored Indic-BERT, MuRIL, XLM-R, and m-BERT for transformer-based approaches. Notably, our most successful model, m-BERT, achieved a macro F1 score of 0.85 and ranked 4th in the shared task. This research contributes to combating misinformation on social media news, offering an effective solution to classify content accurately.

2023

pdf bib abs
Score_IsAll_You_Need at BLP-2023 Task 1: A Hierarchical Classification Approach to Detect Violence Inciting Text using Transformers
Kawsar Ahmed | Md Osama | Md. Sirajul Islam | Md Taosiful Islam | Avishek Das | Mohammed Moshiul Hoque
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)

Violence-inciting text detection has become critical due to its significance in social media monitoring, online security, and the prevention of violent content. Developing an automatic text classification model for identifying violence in languages with limited resources, like Bangla, poses significant challenges due to the scarcity of resources and complex morphological structures. This work presents a transformer-based method that can classify Bangla texts into three violence classes: direct, passive, and non-violence. We leveraged transformer models, including BanglaBERT, XLM-R, and m-BERT, to develop a hierarchical classification model for the downstream task. In the first step, the BanglaBERT is employed to identify the presence of violence in the text. In the next step, the model classifies stem texts that incite violence as either direct or passive. The developed system scored 72.37 and ranked 14th among the participants.

Co-authors

Venues

Fix data