Improved Generalization of Arabic Text Classifiers

Alaa Khaddaj, Hazem Hajj, Wassim El-Hajj


Abstract
While transfer learning for text has been very active in the English language, progress in Arabic has been slow, including the use of Domain Adaptation (DA). Domain Adaptation is used to generalize the performance of any classifier by trying to balance the classifier’s accuracy for a particular task among different text domains. In this paper, we propose and evaluate two variants of a domain adaptation technique: the first is a base model called Domain Adversarial Neural Network (DANN), while the second is a variation that incorporates representational learning. Similar to previous approaches, we propose the use of proxy A-distance as a metric to assess the success of generalization. We make use of ArSentDLEV, a multi-topic dataset collected from the Levantine countries, to test the performance of the models. We show the superiority of the proposed method in accuracy and robustness when dealing with the Arabic language.
Anthology ID:
W19-4618
Volume:
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
167–174
Language:
URL:
https://aclanthology.org/W19-4618/
DOI:
10.18653/v1/W19-4618
Bibkey:
Cite (ACL):
Alaa Khaddaj, Hazem Hajj, and Wassim El-Hajj. 2019. Improved Generalization of Arabic Text Classifiers. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 167–174, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Improved Generalization of Arabic Text Classifiers (Khaddaj et al., WANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4618.pdf
Data
ArSentD-LEV