Capturing Intra-Dialectal Variation in Qatari Arabic: A Corpus of Cultural and Gender Dimensions

Houda Bouamor, Sara Al-Emadi, Zeinab Ibrahim, Hany Fazzaa, Aisha Al-Sultan


Abstract
We present the first publicly available, multidimensional corpus of Qatari Arabic that captures intra-dialectal variation across Urban and Bedouin speakers. While often grouped under the label of “Gulf Arabic”, Qatari Arabic exhibits rich phonological, lexical, and discourse-level differences shaped by gender, age, and sociocultural identity. Our dataset includes aligned speech and transcriptions from 255 speakers, stratified by gender and age, and collected through structured interviews on culturally salient topics such as education, heritage, and social norms. The corpus reveals systematic variation in pronunciation, vocabulary, and narrative style, offering insights for both sociolinguistic analysis and computational modeling. We also demonstrate its utility through preliminary experiments in the prediction of dialects and genders. This work provides the first large-scale, demographically balanced corpus of Qatari Arabic, laying a foundation for both sociolinguistic research and the development of dialect-aware NLP systems.
Anthology ID:
2025.arabicnlp-main.18
Volume:
Proceedings of The Third Arabic Natural Language Processing Conference
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari
Venue:
ArabicNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
219–230
Language:
URL:
https://aclanthology.org/2025.arabicnlp-main.18/
DOI:
Bibkey:
Cite (ACL):
Houda Bouamor, Sara Al-Emadi, Zeinab Ibrahim, Hany Fazzaa, and Aisha Al-Sultan. 2025. Capturing Intra-Dialectal Variation in Qatari Arabic: A Corpus of Cultural and Gender Dimensions. In Proceedings of The Third Arabic Natural Language Processing Conference, pages 219–230, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Capturing Intra-Dialectal Variation in Qatari Arabic: A Corpus of Cultural and Gender Dimensions (Bouamor et al., ArabicNLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.arabicnlp-main.18.pdf