Hamada Nayel

2026

REGLAT at AbjadMed: Handling Imbalanced Arabic Medical Text Classification via Hierarchical KNN-MLP Architecture
Ahmed M. Fetouh | Mohammed Rahmath | Omer Dawood | Mariam Labib | Nsrin Ashraf | Hamada Nayel
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script

In this paper, we demonstrate the system submitted to the shared task of medical text classification in Arabic. We proposed a single-model approach based on fine-tuned LLM-based embedding combined with hierarchical classical classifiers, achieving a competitive macro F1-score of 0.46 on the blind test set. We explored various modeling strategies, including tree-based ensembles, LLM, and hierarchical correction for rare classes, highlighting the effectiveness of domain-specific fine-tuning in low-resource settings. The results demonstrate that a single fine-tuned Arabic BERT variant can serve as a strong baseline in extreme imbalance scenarios, outperforming more complex ensembles in simplicity and reproducibility.

pdf bib abs

REGLAT at AbjadGenEval: Multi-Model Ensemble Approach for Arabic AI-Generated Text Detection
Mariam Labib | Nsrin Ashraf | Ahmed M. Fetouh | Hamada Nayel
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script

The rapid advancement of large language models necessitates robust methods for detecting AI-generated Arabic text. This paper presents our system for distinguishing human-written from machine-generated Arabic content. We propose a weighted ensemble combining AraBERTv2 and BERT-base-arabic, trained via 5-fold stratified cross-validation with class-balanced loss functions. Our methodology incorporates Arabic text normalization, strategic data augmentation using 16,678 samples from external scientific abstracts, and threshold optimization prioritizing recall. On the official test set, our system achieved an F1-score of 0.763, an accuracy of 0.695, a precision of 0.624, and a recall of 0.980, demonstrating strong detection of machine-generated texts with minimal false negatives at the cost of elevated false positives. Analysis reveals critical insights into precision-recall trade-offs and challenges in cross-domain generalization for Arabic AI text detection.

2025

pdf bib abs

NAYEL@DravidianLangTech-2025: Character N-gram and Machine Learning Coordination for Fake News Detection in Dravidian Languages
Hamada Nayel | Mohammed Aldawsari | Hosahalli Lakshmaiah Shashirekha
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

This paper introduces the detailed description of the submitted model by the team NAYEL to Fake News Detection in Dravidian Languages shared task. The proposed model uses a simple character n-gram TF-IDF as a feature extraction approach integrated with an ensemble of various classical machine learning classification algorithms. While the simplicity of the proposed model structure, although it outperforms other complex structure models as the shared task results observed. The proposed model achieved a f1-score of 87.5% and secured the 5th rank.

pdf bib abs

Inside the Box: A Streamlined Model for AI-Generated News Article Detection
Nsrin Ashraf | Mariam Labib | Hamada Nayel
Proceedings of the Shared Task on Multi-Domain Detection of AI-Generated Text

The rapid proliferation of AI-generated text has raised concerns. With the increasing prevalence of AI-generated content, concerns have grown regarding authenticity, authorship, and the spread of misinformation. Detecting such content accurately and efficiently has become a pressing challenge. In this study, we propose a simple yet effective system for classifying AI-generated versus human-written text. Rather than relying on complex or resource-intensive deep learning architectures, our approach leverages classical machine learning algorithms combined with the TF-IDF text representation technique. Evaluated on the M-DAIGT shared task dataset, our Support Vector Machine (SVM) based system achieved strong results, ranking second on the official leaderboard and demonstrating competitive performance across all evaluation metrics. These findings highlight the potential of traditional lightweight models to address modern challenges in text authenticity detection, particularly in low-resource or real-time applications where interpretability and efficiency are essential.

pdf bib

REGLAT at AraGenEval shared task: Morphology-Aware AraBERT for Detecting Arabic AI-Generated Text
Mariam Labib | Nsrin Ashraf | Mohammed Aldawsari | Hamada Nayel
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib

REGLAT at MAHED Shared Task: A Hybrid Ensemble-Based System for Arabic Hate Speech Detection
Nsrin Ashraf | Mariam Labib | Tarek Elshishtawy | Hamada Nayel
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

2024

pdf bib abs

BFCI at AraFinNLP2024: Support Vector Machines for Arabic Financial Text Classification
Nsrin Ashraf | Hamada Nayel | Mohammed Aldawsari | Hosahalli Shashirekha | Tarek Elshishtawy
Proceedings of the Second Arabic Natural Language Processing Conference

In this paper, a description of the system submitted by BFCAI team to the AraFinNLP2024 shared task has been introduced. Our team participated in the first subtask, which aims at detecting the customer intents of cross-dialectal Arabic queries in the banking domain. Our system follows the common pipeline of text classification models using primary classification algorithms integrated with basic vectorization approach for feature extraction. Multi-layer Perceptron, Stochastic Gradient Descent and Support Vector Machines algorithms have been implemented and support vector machines outperformed all other algorithms with an f-score of 49%. Our submission’s result is appropriate compared to the simplicity of the proposed model’s structure.

2022

pdf bib abs

BFCAI at SemEval-2022 Task 6: Multi-Layer Perceptron for Sarcasm Detection in Arabic Texts
Nsrin Ashraf | Fathy Elkazzaz | Mohamed Taha | Hamada Nayel | Tarek Elshishtawy
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes the systems submitted to iSarcasm shared task. The aim of iSarcasm is to identify the sarcastic contents in Arabic and English text. Our team participated in iSarcasm for the Arabic language. A multi-Layer machine learning based model has been submitted for Arabic sarcasm detection. In this model, a vector space TF-IDF has been used as for feature representation. The submitted system is simple and does not need any external resources. The test results show encouraging results.

pdf bib abs

BoNC: Bag of N-Characters Model for Word Level Language Identification
Shimaa Ismail | Mai K. Gallab | Hamada Nayel
Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts

This paper describes the model submitted by NLP_BFCAI team for Kanglish shared task held at ICON 2022. The proposed model used a very simple approach based on the word representation. Simple machine learning classification algorithms, Random Forests, Support Vector Machines, Stochastic Gradient Descent and Multi-Layer Perceptron have been imple- mented. Our submission, RF, securely ranked fifth among all other submissions.

pdf bib abs

NAYEL @LT-EDI-ACL2022: Homophobia/Transphobia Detection for Equality, Diversity, and Inclusion using SVM
Nsrin Ashraf | Mohamed Taha | Ahmed Taha | Hamada Nayel
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Analysing the contents of social media platforms such as YouTube, Facebook and Twitter gained interest due to the vast number of users. One of the important tasks is homophobia/transphobia detection. This paper illustrates the system submitted by our team for the homophobia/transphobia detection in social media comments shared task. A machine learning-based model has been designed and various classification algorithms have been implemented for automatic detection of homophobia in YouTube comments. TF/IDF has been used with a range of bigram model for vectorization of comments. Support Vector Machines has been used to develop the proposed model and our submission reported 0.91, 0.92, 0.88 weighted f1-score for English, Tamil and Tamil-English datasets respectively.

pdf bib abs

Word Representation Models for Arabic Dialect Identification
Mahmoud Sobhy | Ahmed H. Abu El-Atta | Ahmed A. El-Sawy | Hamada Nayel
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

This paper describes the systems submitted by BFCAI team to Nuanced Arabic Dialect Identification (NADI) shared task 2022. Dialect identification task aims at detecting the source variant of a given text or speech segment automatically. There are two subtasks in NADI 2022, the first subtask for country-level identification and the second subtask for sentiment analysis. Our team participated in the first subtask. The proposed systems use Term Frequency Inverse/Document Frequency and word embeddings as vectorization models. Different machine learning algorithms have been used as classifiers. The proposed systems have been tested on two test sets: Test-A and Test-B. The proposed models achieved Macro-f1 score of 21.25% and 9.71% for Test-A and Test-B set respectively. On other hand, the best-performed submitted system achieved Macro-f1 score of 36.48% and 18.95% for Test-A and Test-B set respectively.

2021

pdf bib abs

BFCAI at ComMA@ICON 2021: Support Vector Machines for Multilingual Gender Biased and Communal Language Identification
Fathy Elkazzaz | Fatma Sakr | Rasha Orban | Hamada Nayel
Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification

This paper presents the system that has been submitted to the multilingual gender biased and communal language identification shared task by BFCAI team. The proposed model used Support Vector Machines (SVMs) as a classification algorithm. The features have been extracted using TF/IDF model with unigram and bigram. The proposed model is very simple and there are no external resources are needed to build the model.

pdf bib abs

Machine Learning-Based Approach for Arabic Dialect Identification
Hamada Nayel | Ahmed Hassan | Mahmoud Sobhi | Ahmed El-Sawy
Proceedings of the Sixth Arabic Natural Language Processing Workshop

This paper describes our systems submitted to the Second Nuanced Arabic Dialect Identification Shared Task (NADI 2021). Dialect identification is the task of automatically detecting the source variety of a given text or speech segment. There are four subtasks, two subtasks for country-level identification and the other two subtasks for province-level identification. The data in this task covers a total of 100 provinces from all 21 Arab countries and come from the Twitter domain. The proposed systems depend on five machine-learning approaches namely Complement Naïve Bayes, Support Vector Machine, Decision Tree, Logistic Regression and Random Forest Classifiers. F1 macro-averaged score of Naïve Bayes classifier outperformed all other classifiers for development and test data.

pdf bib abs

Machine Learning-Based Model for Sentiment and Sarcasm Detection
Hamada Nayel | Eslam Amer | Aya Allam | Hanya Abdallah
Proceedings of the Sixth Arabic Natural Language Processing Workshop

Within the last few years, the number of Arabic internet users and Arabic online content is in exponential growth. Dealing with Arabic datasets and the usage of non-explicit sentences to express an opinion are considered to be the major challenges in the field of natural language processing. Hence, sarcasm and sentiment analysis has gained a major interest from the research community, especially in this language. Automatic sarcasm detection and sentiment analysis can be applied using three approaches, namely supervised, unsupervised and hybrid approach. In this paper, a model based on a supervised machine learning algorithm called Support Vector Machine (SVM) has been used for this process. The proposed model has been evaluated using ArSarcasm-v2 dataset. The performance of the proposed model has been compared with other models submitted to sentiment analysis and sarcasm detection shared task.

2020

pdf bib abs

NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets
Hamada Nayel
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we present the system submitted to “SemEval-2020 Task 12”. The proposed system aims at automatically identify the Offensive Language in Arabic Tweets. A machine learning based approach has been used to design our system. We implemented a linear classifier with Stochastic Gradient Descent (SGD) as optimization algorithm. Our model reported 84.20%, 81.82% f1-score on development set and test set respectively. The best performed system and the system in the last rank reported 90.17% and 44.51% f1-score on test set respectively.