Arabic Topic Classification in the Generative and AutoML Era

Doha Albared, Hadi Hamoud, Fadi Zaraket


Abstract
Most recent models for Arabic topic classification leveraged fine-tuning existing pre-trained transformer models and targeted a limited number of categories. More recently, advances in automated ML and generative models introduced novel potentials for the task. While these approaches work for English, it is a question of whether they perform well for low-resourced languages; Arabic in particular. This paper presents (i) ArBoNeClass; a novel Arabic dataset with an extended 14-topic class set covering modern books from social sciences and humanities along with newspaper articles, and (ii) a set of topic classifiers built from it. We finetuned an open LLM model to build ArGTClass. We compared its performance against the best models built with Vertex AI (Google), AutoML(H2O), and AutoTrain(HuggingFace). ArGTClass outperformed the VertexAi and AutoML models and was reasonably similar to the AutoTrain model.
Anthology ID:
2023.arabicnlp-1.32
Volume:
Proceedings of ArabicNLP 2023
Month:
December
Year:
2023
Address:
Singapore (Hybrid)
Editors:
Hassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
Venues:
ArabicNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
399–404
Language:
URL:
https://aclanthology.org/2023.arabicnlp-1.32
DOI:
10.18653/v1/2023.arabicnlp-1.32
Bibkey:
Cite (ACL):
Doha Albared, Hadi Hamoud, and Fadi Zaraket. 2023. Arabic Topic Classification in the Generative and AutoML Era. In Proceedings of ArabicNLP 2023, pages 399–404, Singapore (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Arabic Topic Classification in the Generative and AutoML Era (Albared et al., ArabicNLP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.arabicnlp-1.32.pdf