Functional Text Dimensions for Arabic Text Classification

Zeyd Ferhat; Abir Betka; Riyadh Barka; Zineddine Kahhoul; Selma Boutiba; Mohamed Tiar; Habiba Dahmani; Ahmed Abdelali

doi:10.18653/v1/2024.arabicnlp-1.29

Functional Text Dimensions for Arabic Text Classification

Zeyd Ferhat, Abir Betka, Riyadh Barka, Zineddine Kahhoul, Selma Boutiba, Mohamed Tiar, Habiba Dahmani, Ahmed Abdelali

Abstract

Text classification is of paramount importance in a wide range of applications, including information retrieval, extraction and sentiment analysis. The challenge of classifying and labelling text genres, especially in web-based corpora, has received considerable attention. The frequent absence of unambiguous genre information complicates the identification of text types. To address these issues, the Functional Text Dimensions (FTD) method has been introduced to provide a universal set of categories for text classification. This study presents the Arabic Functional Text Dimensions Corpus (AFTD Corpus), a carefully curated collection of documents for evaluating text classification in Arabic. The AFTD Corpus which we are making available to the community, consists of 3400 documents spanning 17 different class categories. Through a comprehensive evaluation using traditional machine learning and neural models, we assess the effectiveness of the FTD approach in the Arabic context. CAMeLBERT, a state-of-the-art model, achieved an impressive F1 score of 0.81 on our corpus. This research highlights the potential of the FTD method for improving text classification, especially for Arabic content, and underlines the importance of robust classification models in web applications.

Anthology ID:: 2024.arabicnlp-1.29
Volume:: Proceedings of the Second Arabic Natural Language Processing Conference
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Nizar Habash, Houda Bouamor, Ramy Eskander, Nadi Tomeh, Ibrahim Abu Farha, Ahmed Abdelali, Samia Touileb, Injy Hamed, Yaser Onaizan, Bashar Alhafni, Wissam Antoun, Salam Khalifa, Hatem Haddad, Imed Zitouni, Badr AlKhamissi, Rawan Almatham, Khalil Mrini
Venues:: ArabicNLP | WS
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 352–360
Language:
URL:: https://aclanthology.org/2024.arabicnlp-1.29/
DOI:: 10.18653/v1/2024.arabicnlp-1.29
Bibkey:
Cite (ACL):: Zeyd Ferhat, Abir Betka, Riyadh Barka, Zineddine Kahhoul, Selma Boutiba, Mohamed Tiar, Habiba Dahmani, and Ahmed Abdelali. 2024. Functional Text Dimensions for Arabic Text Classification. In Proceedings of the Second Arabic Natural Language Processing Conference, pages 352–360, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Functional Text Dimensions for Arabic Text Classification (Ferhat et al., ArabicNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.arabicnlp-1.29.pdf

PDF Cite Search Fix data