Building a Corpus of Qatari Arabic Expressions

Sara Al-Mulla, Wajdi Zaghouani


Abstract
The current Arabic natural language processing resources are mainly build to address the Modern Standard Arabic (MSA), while we witnessed some scattered efforts to build resources for various Arabic dialects such as the Levantine and the Egyptian dialects. We observed a lack of resources for Gulf Arabic and especially the Qatari variety. In this paper, we present the first Qatari idioms and expression corpus of 1000 entries. The corpus was created from on-line and printed sources in addition to transcribed recorded interviews. The corpus covers various Qatari traditional expressions and idioms. To this end, audio recordings were collected from interviews and an online survey questionnaire was conducted to validate our data. This corpus aims to help advance the dialectal Arabic Speech and Natural Language Processing tools and applications for the Qatari dialect.
Anthology ID:
2020.osact-1.4
Volume:
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Hend Al-Khalifa, Walid Magdy, Kareem Darwish, Tamer Elsayed, Hamdy Mubarak
Venue:
OSACT
SIG:
Publisher:
European Language Resource Association
Note:
Pages:
24–31
Language:
English
URL:
https://aclanthology.org/2020.osact-1.4
DOI:
Bibkey:
Cite (ACL):
Sara Al-Mulla and Wajdi Zaghouani. 2020. Building a Corpus of Qatari Arabic Expressions. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 24–31, Marseille, France. European Language Resource Association.
Cite (Informal):
Building a Corpus of Qatari Arabic Expressions (Al-Mulla & Zaghouani, OSACT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.osact-1.4.pdf