Firas Al Mahrouqi
2026
OMAN-SPEECH: A Multi-Layer Annotated Speech Corpus for Omani Arabic Dialects
Rayyan S. Al Khadhuri | Firas Al Mahrouqi | Salim Al Mandhari | Amir Azad Al-Kathiri | Omar Said Alshahri | Ghassab Mansoor Alsaqr | Badri Abdulhakim Mudhsh | Tarek Fatnassi
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Rayyan S. Al Khadhuri | Firas Al Mahrouqi | Salim Al Mandhari | Amir Azad Al-Kathiri | Omar Said Alshahri | Ghassab Mansoor Alsaqr | Badri Abdulhakim Mudhsh | Tarek Fatnassi
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Automatic Speech Recognition (ASR) has achieved strong performance in high-resource languages; however, Dialectal Arabic remains significantly under-resourced. This gap is particularly evident in Oman, where Arabic exhibits substantial sociolinguistic variation shaped by settlement patterns between sedentary (Hadari) and nomadic (Badu) communities, which are often overlooked by urban-centric or generalized Gulf Arabic datasets. We introduce OMAN-SPEECH, a sociolinguistically stratified spoken corpus for Omani Arabic comprising approximately 40 hours of spontaneous and semi-spontaneous speech from 32 speakers across 11 Wilayats (provinces). The corpus is balanced to capture regional and lifestyle variation and is annotated at the sentence level with Arabic transcription, English translation, and phonetic transcription using the International Phonetic Alphabet (IPA) through a human-in-the-loop annotation pipeline. OMAN-SPEECH provides a foundational resource for evaluating ASR and related speech technologies on Omani and Gulf Arabic varieties and supports more granular modeling of regional dialectal variation.
2025
Lab17 @ Ahasis Shared Task 2025: Fine-Tuning and Prompting techniques for Sentiment Analysis of Saudi and Darija Dialects
Al Mukhtar Al Hadhrami | Firas Al Mahrouqi | Mohammed Al Shaaili | Hala Mulki
Proceedings of the Shared Task on Sentiment Analysis for Arabic Dialects
Al Mukhtar Al Hadhrami | Firas Al Mahrouqi | Mohammed Al Shaaili | Hala Mulki
Proceedings of the Shared Task on Sentiment Analysis for Arabic Dialects
In this paper, we describe our contribution in Ahasis shared task: Sentiment analysis on Arabic Dialects in the Hospitality Domain. Through the presented framework, we explored using two learning strategies tailored to a Large Language Model (LLM) and Transformer-based model variants. While few-shot prompting was used with GPT-4o, fine-tuning was adopted once to refine the essential MARBERT model on the Ahasis dataset and then to utilize a MARBERT variant model, SODA-BERT, that was pretrained on an Omani sentiment dataset and later evaluated with the shared task data.