Robert Nabil
2019
Morphology-aware Word-Segmentation in Dialectal Arabic Adaptation of Neural Machine Translation
Ahmed Tawfik
|
Mahitab Emam
|
Khaled Essam
|
Robert Nabil
|
Hany Hassan
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Parallel corpora available for building machine translation (MT) models for dialectal Arabic (DA) are rather limited. The scarcity of resources has prompted the use of Modern Standard Arabic (MSA) abundant resources to complement the limited dialectal resource. However, dialectal clitics often differ between MSA and DA. This paper compares morphology-aware DA word segmentation to other word segmentation approaches like Byte Pair Encoding (BPE) and Sub-word Regularization (SR). A set of experiments conducted on Egyptian Arabic (EA), Levantine Arabic (LA), and Gulf Arabic (GA) show that a sufficiently accurate morphology-aware segmentation used in conjunction with BPE outperforms the other word segmentation approaches.