Noor Abo Mokh


2022

pdf bib
Improving POS Tagging for Arabic Dialects on Out-of-Domain Texts
Noor Abo Mokh | Daniel Dakota | Sandra Kübler
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

We investigate part of speech tagging for four Arabic dialects (Gulf, Levantine, Egyptian, and Maghrebi), in an out-of-domain setting. More specifically, we look at the effectiveness of 1) upsampling the target dialect in the training data of a joint model, 2) increasing the consistency of the annotations, and 3) using word embeddings pre-trained on a large corpus of dialectal Arabic. We increase the accuracy on average by about 20 percentage points.