Fouad Shammary
2025
Enhancing Dialectal Arabic Intent Detection through Cross-Dialect Multilingual Input Augmentation
Shehenaz Hossain
|
Fouad Shammary
|
Bahaulddin Shammary
|
Haithem Afli
Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4)
Addressing the challenges of Arabic intent detection amid extensive dialectal variation, this study presents a crossdialtectal, multilingual approach for classifying intents in banking and migration contexts. By augmenting dialectal inputs with Modern Standard Arabic (MSA) and English translations, our method leverages cross-lingual context to improve classification accuracy. We evaluate single-input (dialect-only), dual-input (dialect + MSA), and triple-input (dialect + MSA + English) models, applying language-specific tokenization for each. Results demonstrate that, in the migration dataset, our model achieved an accuracy gain of over 50% on Tunisian dialect, increasing from 43.3% with dialect-only input to 94% with the full multilingual setup. Similarly, in the PAL (Palestinian dialect) dataset, accuracy improved from 87.7% to 93.5% with translation augmentation, reflecting a gain of 5.8 percentage points. These findings underscore the effectiveness of our approach for intent detection across various Arabic dialects.
2022
TF-IDF or Transformers for Arabic Dialect Identification? ITFLOWS participation in the NADI 2022 Shared Task
Fouad Shammary
|
Yiyi Chen
|
Zsolt T Kardkovacs
|
Mehwish Alam
|
Haithem Afli
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
This study targets the shared task of Nuanced Arabic Dialect Identification (NADI) organized with the Workshop on Arabic Natural Language Processing (WANLP). It further focuses on Subtask 1 on the identification of the Arabic dialects at the country level. More specifically, it studies the impact of a traditional approach such as TF-IDF and then moves on to study the impact of advanced deep learning based methods. These methods include fully fine-tuning MARBERT as well as adapter based fine-tuning of MARBERT with and without performing data augmentation. The evaluation shows that the traditional approach based on TF-IDF scores the best in terms of accuracy on TEST-A dataset, while, the fine-tuned MARBERT with adapter on augmented data scores the second on Macro F1-score on the TEST-B dataset. This led to the proposed system being ranked second on the shared task on average.
Search
Fix data
Co-authors
- Haithem Afli 2
- Mehwish Alam 1
- Yiyi Chen 1
- Shehenaz Hossain 1
- Zsolt T Kardkovacs 1
- show all...