A Semi-Supervised BERT Approach for Arabic Named Entity Recognition

Chadi Helwe, Ghassan Dib, Mohsen Shamas, Shady Elbassuoni


Abstract
Named entity recognition (NER) plays a significant role in many applications such as information extraction, information retrieval, question answering, and even machine translation. Most of the work on NER using deep learning was done for non-Arabic languages like English and French, and only few studies focused on Arabic. This paper proposes a semi-supervised learning approach to train a BERT-based NER model using labeled and semi-labeled datasets. We compared our approach against various baselines, and state-of-the-art Arabic NER tools on three datasets: AQMAR, NEWS, and TWEETS. We report a significant improvement in F-measure for the AQMAR and the NEWS datasets, which are written in Modern Standard Arabic (MSA), and competitive results for the TWEETS dataset, which contains tweets that are mostly in the Egyptian dialect and contain many mistakes or misspellings.
Anthology ID:
2020.wanlp-1.5
Volume:
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Imed Zitouni, Muhammad Abdul-Mageed, Houda Bouamor, Fethi Bougares, Mahmoud El-Haj, Nadi Tomeh, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
49–57
Language:
URL:
https://aclanthology.org/2020.wanlp-1.5
DOI:
Bibkey:
Cite (ACL):
Chadi Helwe, Ghassan Dib, Mohsen Shamas, and Shady Elbassuoni. 2020. A Semi-Supervised BERT Approach for Arabic Named Entity Recognition. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 49–57, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
A Semi-Supervised BERT Approach for Arabic Named Entity Recognition (Helwe et al., WANLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wanlp-1.5.pdf