2024
pdf
bib
abs
BFCI at AraFinNLP2024: Support Vector Machines for Arabic Financial Text Classification
Nsrin Ashraf
|
Hamada Nayel
|
Mohammed Aldawsari
|
Hosahalli Shashirekha
|
Tarek Elshishtawy
Proceedings of The Second Arabic Natural Language Processing Conference
In this paper, a description of the system submitted by BFCAI team to the AraFinNLP2024 shared task has been introduced. Our team participated in the first subtask, which aims at detecting the customer intents of cross-dialectal Arabic queries in the banking domain. Our system follows the common pipeline of text classification models using primary classification algorithms integrated with basic vectorization approach for feature extraction. Multi-layer Perceptron, Stochastic Gradient Descent and Support Vector Machines algorithms have been implemented and support vector machines outperformed all other algorithms with an f-score of 49%. Our submission’s result is appropriate compared to the simplicity of the proposed model’s structure.
2022
pdf
bib
abs
BoNC: Bag of N-Characters Model for Word Level Language Identification
Shimaa Ismail
|
Mai K. Gallab
|
Hamada Nayel
Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts
This paper describes the model submitted by NLP_BFCAI team for Kanglish shared task held at ICON 2022. The proposed model used a very simple approach based on the word representation. Simple machine learning classification algorithms, Random Forests, Support Vector Machines, Stochastic Gradient Descent and Multi-Layer Perceptron have been imple- mented. Our submission, RF, securely ranked fifth among all other submissions.
pdf
bib
abs
NAYEL @LT-EDI-ACL2022: Homophobia/Transphobia Detection for Equality, Diversity, and Inclusion using SVM
Nsrin Ashraf
|
Mohamed Taha
|
Ahmed Abd Elfattah
|
Hamada Nayel
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Analysing the contents of social media platforms such as YouTube, Facebook and Twitter gained interest due to the vast number of users. One of the important tasks is homophobia/transphobia detection. This paper illustrates the system submitted by our team for the homophobia/transphobia detection in social media comments shared task. A machine learning-based model has been designed and various classification algorithms have been implemented for automatic detection of homophobia in YouTube comments. TF/IDF has been used with a range of bigram model for vectorization of comments. Support Vector Machines has been used to develop the proposed model and our submission reported 0.91, 0.92, 0.88 weighted f1-score for English, Tamil and Tamil-English datasets respectively.
pdf
bib
abs
BFCAI at SemEval-2022 Task 6: Multi-Layer Perceptron for Sarcasm Detection in Arabic Texts
Nsrin Ashraf
|
Fathy Elkazzaz
|
Mohamed Taha
|
Hamada Nayel
|
Tarek Elshishtawy
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
This paper describes the systems submitted to iSarcasm shared task. The aim of iSarcasm is to identify the sarcastic contents in Arabic and English text. Our team participated in iSarcasm for the Arabic language. A multi-Layer machine learning based model has been submitted for Arabic sarcasm detection. In this model, a vector space TF-IDF has been used as for feature representation. The submitted system is simple and does not need any external resources. The test results show encouraging results.
pdf
bib
abs
Word Representation Models for Arabic Dialect Identification
Mahmoud Sobhy
|
Ahmed H. Abu El-Atta
|
Ahmed A. El-Sawy
|
Hamada Nayel
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
This paper describes the systems submitted by BFCAI team to Nuanced Arabic Dialect Identification (NADI) shared task 2022. Dialect identification task aims at detecting the source variant of a given text or speech segment automatically. There are two subtasks in NADI 2022, the first subtask for country-level identification and the second subtask for sentiment analysis. Our team participated in the first subtask. The proposed systems use Term Frequency Inverse/Document Frequency and word embeddings as vectorization models. Different machine learning algorithms have been used as classifiers. The proposed systems have been tested on two test sets: Test-A and Test-B. The proposed models achieved Macro-f1 score of 21.25% and 9.71% for Test-A and Test-B set respectively. On other hand, the best-performed submitted system achieved Macro-f1 score of 36.48% and 18.95% for Test-A and Test-B set respectively.
2021
pdf
bib
abs
BFCAI at ComMA@ICON 2021: Support Vector Machines for Multilingual Gender Biased and Communal Language Identification
Fathy Elkazzaz
|
Fatma Sakr
|
Rasha Orban
|
Hamada Nayel
Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification
This paper presents the system that has been submitted to the multilingual gender biased and communal language identification shared task by BFCAI team. The proposed model used Support Vector Machines (SVMs) as a classification algorithm. The features have been extracted using TF/IDF model with unigram and bigram. The proposed model is very simple and there are no external resources are needed to build the model.
pdf
bib
abs
Machine Learning-Based Approach for Arabic Dialect Identification
Hamada Nayel
|
Ahmed Hassan
|
Mahmoud Sobhi
|
Ahmed El-Sawy
Proceedings of the Sixth Arabic Natural Language Processing Workshop
This paper describes our systems submitted to the Second Nuanced Arabic Dialect Identification Shared Task (NADI 2021). Dialect identification is the task of automatically detecting the source variety of a given text or speech segment. There are four subtasks, two subtasks for country-level identification and the other two subtasks for province-level identification. The data in this task covers a total of 100 provinces from all 21 Arab countries and come from the Twitter domain. The proposed systems depend on five machine-learning approaches namely Complement Naïve Bayes, Support Vector Machine, Decision Tree, Logistic Regression and Random Forest Classifiers. F1 macro-averaged score of Naïve Bayes classifier outperformed all other classifiers for development and test data.
pdf
bib
abs
Machine Learning-Based Model for Sentiment and Sarcasm Detection
Hamada Nayel
|
Eslam Amer
|
Aya Allam
|
Hanya Abdallah
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Within the last few years, the number of Arabic internet users and Arabic online content is in exponential growth. Dealing with Arabic datasets and the usage of non-explicit sentences to express an opinion are considered to be the major challenges in the field of natural language processing. Hence, sarcasm and sentiment analysis has gained a major interest from the research community, especially in this language. Automatic sarcasm detection and sentiment analysis can be applied using three approaches, namely supervised, unsupervised and hybrid approach. In this paper, a model based on a supervised machine learning algorithm called Support Vector Machine (SVM) has been used for this process. The proposed model has been evaluated using ArSarcasm-v2 dataset. The performance of the proposed model has been compared with other models submitted to sentiment analysis and sarcasm detection shared task.
2020
pdf
bib
abs
NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets
Hamada Nayel
Proceedings of the Fourteenth Workshop on Semantic Evaluation
In this paper, we present the system submitted to “SemEval-2020 Task 12”. The proposed system aims at automatically identify the Offensive Language in Arabic Tweets. A machine learning based approach has been used to design our system. We implemented a linear classifier with Stochastic Gradient Descent (SGD) as optimization algorithm. Our model reported 84.20%, 81.82% f1-score on development set and test set respectively. The best performed system and the system in the last rank reported 90.17% and 44.51% f1-score on test set respectively.
2017
pdf
bib
Improving NER for Clinical Texts by Ensemble Approach using Segment Representations
Hamada Nayel
|
H. L. Shashirekha
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)