Michael Ibrahim


2025

pdf bib
CUFE@NLU of Devanagari Script Languages 2025: Language Identification using fastText
Michael Ibrahim
Proceedings of the First Workshop on Challenges in Processing South Asian Languages (CHiPSAL 2025)

Language identification is a critical area of research within natural language processing (NLP), particularly in multilingual contexts where accurate language detection can enhance the performance of various applications, such as machine translation, content moderation, and user interaction systems. This paper presents a language identification system developed using fastText. In the CHIPSAL@COLING 2025 Task on Devanagari Script Language Identification, the proposed method achieved first place, with an F1 score of 0.9997.

pdf bib
CUFE@VarDial 2025 NorSID: Multilingual BERT for Norwegian Dialect Identification and Intent Detection
Michael Ibrahim
Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects

Dialect identification is crucial in enhancing various tasks, including sentiment analysis, as a speaker’s geographical origin can significantly affect their perspective on a topic, also, intent detection has gained significant traction in natural language processing due to its applications in various domains, including virtual assistants, customer service automation, and information retrieval systems. This work describes a system developed for VarDial 2025: Norwegian slot and intent detection and dialect identification shared task (Scherrer et al., 2025), a challenge designed to address the dialect recognition and intent detection problems for a low-resource language like Norwegian. More specifically, this work investigates the performance of different BERT models in solving this problem. Finally, the output of the multilingual version of the BERT model was submitted to this shared task, the developed system achieved a weighted F1 score of 79.64 for dialect identification and an accuracy of 94.38 for intent detection.

2024

pdf bib
CUFE at NADI 2024 shared task: Fine-Tuning Llama-3 To Translate From Arabic Dialects To Modern Standard Arabic
Michael Ibrahim
Proceedings of The Second Arabic Natural Language Processing Conference

LLMs such as GPT-4 and LLaMA excel in multiple natural language processing tasks, however, LLMs face challenges in delivering satisfactory performance on low-resource languages due to limited availability of training data. In this paper, LLaMA-3 with 8 Billion parameters is finetuned to translate among Egyptian, Emirati, Jordanian, Palestinian Arabic dialects, and Modern Standard Arabic (MSA). In the NADI 2024 Task on DA-MSA Machine Translation, the proposed method achieved a BLEU score of 21.44 when it was fine-tuned on thedevelopment dataset of the NADI 2024 Task on DA-MSA and a BLEU score of 16.09 when trained when it was fine-tuned using the OSACT dataset.

pdf bib
CUFE at StanceEval2024: Arabic Stance Detection with Fine-Tuned Llama-3 Model
Michael Ibrahim
Proceedings of The Second Arabic Natural Language Processing Conference

In NLP, stance detection identifies a writer’s position or viewpoint on a particular topic or entity from their text and social media activity, which includes preferences and relationships.Researchers have been exploring techniques and approaches to develop effective stance detection systems.Large language models’ latest advancements offer a more effective solution to the stance detection problem. This paper proposes fine-tuning the newly released 8B-parameter Llama 3 model from Meta GenAI for Arabic text stance detection.The proposed method was ranked ninth in the StanceEval 2024 Task on stance detection in Arabic language achieving a Macro average F1 score of 0.7647.