Cross-Dialectal Named Entity Recognition in Arabic

Niama El Khbir; Urchade Zaratiana; Nadi Tomeh; Thierry Charnois

doi:10.18653/v1/2023.arabicnlp-1.12

Cross-Dialectal Named Entity Recognition in Arabic

Niama Elkhbir, Urchade Zaratiana, Nadi Tomeh, Thierry Charnois

Abstract

In this paper, we study the transferability of Named Entity Recognition (NER) models between Arabic dialects. This question is important because the available manually-annotated resources are not distributed equally across dialects: Modern Standard Arabic (MSA) is much richer than other dialects for which little to no datasets exist. How well does a NER model, trained on MSA, perform on other dialects? To answer this question, we construct four datasets. The first is an MSA dataset extracted from the ACE 2005 corpus. The others are datasets for Egyptian, Morocan and Syrian which we manually annotate following the ACE guidelines. We train a span-based NER model on top of a pretrained language model (PLM) encoder on the MSA data and study its performance on the other datasets in zero-shot settings. We study the performance of multiple PLM encoders from the literature and show that they achieve acceptable performance with no annotation effort. Our annotations and models are publicly available (https://github.com/niamaelkhbir/Arabic-Cross-Dialectal-NER).

Anthology ID:: 2023.arabicnlp-1.12
Volume:: Proceedings of ArabicNLP 2023
Month:: December
Year:: 2023
Address:: Singapore (Hybrid)
Editors:: Hassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
Venues:: ArabicNLP | WS
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 140–149
Language:
URL:: https://aclanthology.org/2023.arabicnlp-1.12/
DOI:: 10.18653/v1/2023.arabicnlp-1.12
Bibkey:
Cite (ACL):: Niama Elkhbir, Urchade Zaratiana, Nadi Tomeh, and Thierry Charnois. 2023. Cross-Dialectal Named Entity Recognition in Arabic. In Proceedings of ArabicNLP 2023, pages 140–149, Singapore (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Cross-Dialectal Named Entity Recognition in Arabic (Elkhbir et al., ArabicNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.arabicnlp-1.12.pdf

PDF Cite Search Fix data