Noor Abo Mokh

2024

Out-of-Domain Dependency Parsing for Dialects of Arabic: A Case Study
Noor Abo Mokh | Daniel Dakota | Sandra Kübler
Proceedings of the Second Arabic Natural Language Processing Conference

We study dependency parsing for four Arabic dialects (Gulf, Levantine, Egyptian, and Maghrebi). Since no syntactically annotated data exist for Arabic dialects, we train the parser on a Modern Standard Arabic (MSA) corpus, which creates an out-of-domain setting.We investigate methods to close the gap between the source (MSA) and target data (dialects), e.g., by training on syntactically similar sentences to the test data. For testing, we manually annotate a small data set from a dialectal corpus. We focus on parsing two linguistic phenomena, which are difficult to parse: Idafa and coordination. We find that we can improve results by adding in-domain MSA data while adding dialectal embeddings only results in minor improvements.

2022

pdf bib abs

Improving POS Tagging for Arabic Dialects on Out-of-Domain Texts
Noor Abo Mokh | Daniel Dakota | Sandra Kübler
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

We investigate part of speech tagging for four Arabic dialects (Gulf, Levantine, Egyptian, and Maghrebi), in an out-of-domain setting. More specifically, we look at the effectiveness of 1) upsampling the target dialect in the training data of a joint model, 2) increasing the consistency of the annotations, and 3) using word embeddings pre-trained on a large corpus of dialectal Arabic. We increase the accuracy on average by about 20 percentage points.

Co-authors

Daniel Dakota 2
Sandra Kübler 2

Venues

Fix author