Tolulope Olalekan Abiola


2025

pdf bib
CIC-NLP@DravidianLangTech 2025: Detecting AI-generated Product Reviews in Dravidian Languages
Tewodros Achamaleh | Tolulope Olalekan Abiola | Lemlem Eyob Kawo | Mikiyas Mebraihtu | Grigori Sidorov
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

AI-generated text now matches human writing so well that telling them apart is very difficult. Our CIC-NLP team submits results for the DravidianLangTech@NAACL 2025 shared task to reveal AI-generated product reviews in Dravidian languages. We performed a binary classification task with XLM-RoBERTa-Base using the DravidianLangTech@NAACL 2025 datasets offered by the event organizers. Through training the model correctly, our tests could tell between human and AI-generated reviews with scores of 0.96 for Tamil and 0.88 for Malayalam in the evaluation test set. This paper presents detailed information about preprocessing, model architecture, hyperparameter fine-tuning settings, the experimental process, and the results. The source code is available on GitHub1.

pdf bib
CIC-NLP at GenAI Detection Task 1: Advancing Multilingual Machine-Generated Text Detection
Tolulope Olalekan Abiola | Tewodros Achamaleh Bizuneh | Fatima Uroosa | Nida Hafeez | Grigori Sidorov | Olga Kolesnikova | Olumide Ebenezer Ojo
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

Machine-written texts are gradually becoming indistinguishable from human-generated texts, leading to the need to use sophisticated methods to detect them. Team CIC-NLP presents work in the Gen-AI Content Detection Task 1 at COLING 2025 Workshop: the focus of our work is on Subtask B of Task 1, which is the classification of text written by machines and human authors, with particular attention paid to identifying multilingual binary classification problem. Usng mBERT, we addressed the binary classification task using the dataset provided by the GenAI Detection Task team. mBERT acchieved a macro-average F1-score of 0.72 as well as an accuracy score of 0.73.

pdf bib
CIC-NLP at GenAI Detection Task 1: Leveraging DistilBERT for Detecting Machine-Generated Text in English
Tolulope Olalekan Abiola | Tewodros Achamaleh Bizuneh | Oluwatobi Joseph Abiola | Temitope Olasunkanmi Oladepo | Olumide Ebenezer Ojo | Grigori Sidorov | Olga Kolesnikova
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

As machine-generated texts (MGT) become increasingly similar to human writing, these dis- tinctions are harder to identify. In this paper, we as the CIC-NLP team present our submission to the Gen-AI Content Detection Workshop at COLING 2025 for Task 1 Subtask A, which involves distinguishing between text generated by LLMs and text authored by humans, with an emphasis on detecting English-only MGT. We applied the DistilBERT model to this binary classification task using the dataset provided by the organizers. Fine-tuning the model effectively differentiated between the classes, resulting in a micro-average F1-score of 0.70 on the evaluation test set. We provide a detailed explanation of the fine-tuning parameters and steps involved in our analysis.

pdf bib
EM-26@LT-EDI 2025: Detecting Racial Hoaxes in Code-Mixed Social Media Data
Tewodros Achamaleh | Fatima Uroosa | Nida Hafeez | Tolulope Olalekan Abiola | Mikiyas Mebraihtu | Sara Getachew | Grigori Sidorov | Rolando Quintero
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

Social media platforms and user-generated content, such as tweets, comments, and blog posts often contain offensive language, including racial hate speech, personal attacks, and sexual harassment. Detecting such inappropriate language is essential to ensure user safety and to prevent the spread of hateful behavior and online aggression. Approaches base on conventional machine learning and deep learning have shown robust results for high-resource languages like English and find it hard to deal with code-mixed text, which is common in bilingual communication. We participated in the shared task “LT-EDI@LDK 2025” organized by DravidianLangTech, applying the BERT-base multilingual cased model and achieving an F1 score of 0.63. These results demonstrate how our model effectively processes and interprets the unique linguistic features of code-mixed content. The source code is available on GitHub.1

pdf bib
EM-26@LT-EDI 2025: Caste and Migration Hate Speech Detection in Tamil-English Code-Mixed Social Media Texts
Tewodros Achamaleh | Tolulope Olalekan Abiola | Mikiyas Mebraihtu | Sara Getachew | Grigori Sidorov
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion

In this paper, we describe the system developed by Team EM-26 for the Shared Task on Caste and Migration Hate Speech Detection at LTEDI@LDK 2025. The task addresses the challenge of recognizing caste-based and migration related hate speech in Tamil social media text, a language that is both nuanced and under resourced for machine learning. To tackle this, we fine-tuned the multilingual transformer XLM-RoBERTa-Large on the provided training data, leveraging its cross-lingual strengths to detect both explicit and implicit hate speech. To improve performance, we applied social media focused preprocessing techniques, including Tamil text normalization and noise removal. Our model achieved a macro F1-score of 0.6567 on the test set, highlighting the effectiveness of multilingual transformers for low resource hate speech detection. Additionally, we discuss key challenges and errors in Tamil hate speech classification, which may guide future work toward building more ethical and inclusive AI systems. The source code is available on GitHub.1