Out-of-Domain Dependency Parsing for Dialects of Arabic: A Case Study

Noor Mokh, Daniel Dakota, Sandra Kübler


Abstract
We study dependency parsing for four Arabic dialects (Gulf, Levantine, Egyptian, and Maghrebi). Since no syntactically annotated data exist for Arabic dialects, we train the parser on a Modern Standard Arabic (MSA) corpus, which creates an out-of-domain setting.We investigate methods to close the gap between the source (MSA) and target data (dialects), e.g., by training on syntactically similar sentences to the test data. For testing, we manually annotate a small data set from a dialectal corpus. We focus on parsing two linguistic phenomena, which are difficult to parse: Idafa and coordination. We find that we can improve results by adding in-domain MSA data while adding dialectal embeddings only results in minor improvements.
Anthology ID:
2024.arabicnlp-1.16
Volume:
Proceedings of The Second Arabic Natural Language Processing Conference
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Nizar Habash, Houda Bouamor, Ramy Eskander, Nadi Tomeh, Ibrahim Abu Farha, Ahmed Abdelali, Samia Touileb, Injy Hamed, Yaser Onaizan, Bashar Alhafni, Wissam Antoun, Salam Khalifa, Hatem Haddad, Imed Zitouni, Badr AlKhamissi, Rawan Almatham, Khalil Mrini
Venues:
ArabicNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
170–182
Language:
URL:
https://aclanthology.org/2024.arabicnlp-1.16
DOI:
Bibkey:
Cite (ACL):
Noor Mokh, Daniel Dakota, and Sandra Kübler. 2024. Out-of-Domain Dependency Parsing for Dialects of Arabic: A Case Study. In Proceedings of The Second Arabic Natural Language Processing Conference, pages 170–182, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Out-of-Domain Dependency Parsing for Dialects of Arabic: A Case Study (Mokh et al., ArabicNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.arabicnlp-1.16.pdf