ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic

Mustafa Jarrar; Ahmet Birim; Mohammed Khalilia; Mustafa Erden; Sana Ghanem

doi:10.18653/v1/2023.arabicnlp-1.22

ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic

Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, Sana Ghanem

Abstract

This paper presents the ArBanking77, a large Arabic dataset for intent detection in the banking domain. Our dataset was arabized and localized from the original English Banking77 dataset, which consists of 13,083 queries to ArBanking77 dataset with 31,404 queries in both Modern Standard Arabic (MSA) and Palestinian dialect, with each query classified into one of the 77 classes (intents). Furthermore, we present a neural model, based on AraBERT, fine-tuned on ArBanking77, which achieved an F1-score of 0.9209 and 0.8995 on MSA and Palestinian dialect, respectively. We performed extensive experimentation in which we simulated low-resource settings, where the model is trained on a subset of the data and augmented with noisy queries to simulate colloquial terms, mistakes and misspellings found in real NLP systems, especially live chat queries. The data and the models are publicly available at https://sina.birzeit.edu/arbanking77.

Anthology ID:: 2023.arabicnlp-1.22
Volume:: Proceedings of ArabicNLP 2023
Month:: December
Year:: 2023
Address:: Singapore (Hybrid)
Editors:: Hassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
Venues:: ArabicNLP | WS
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 276–287
Language:
URL:: https://aclanthology.org/2023.arabicnlp-1.22/
DOI:: 10.18653/v1/2023.arabicnlp-1.22
Bibkey:
Cite (ACL):: Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, and Sana Ghanem. 2023. ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic. In Proceedings of ArabicNLP 2023, pages 276–287, Singapore (Hybrid). Association for Computational Linguistics.
Cite (Informal):: ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic (Jarrar et al., ArabicNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.arabicnlp-1.22.pdf

PDF Cite Search Fix data