Alaa Khaddaj
2019
Improved Generalization of Arabic Text Classifiers
Alaa Khaddaj
|
Hazem Hajj
|
Wassim El-Hajj
Proceedings of the Fourth Arabic Natural Language Processing Workshop
While transfer learning for text has been very active in the English language, progress in Arabic has been slow, including the use of Domain Adaptation (DA). Domain Adaptation is used to generalize the performance of any classifier by trying to balance the classifier’s accuracy for a particular task among different text domains. In this paper, we propose and evaluate two variants of a domain adaptation technique: the first is a base model called Domain Adversarial Neural Network (DANN), while the second is a variation that incorporates representational learning. Similar to previous approaches, we propose the use of proxy A-distance as a metric to assess the success of generalization. We make use of ArSentDLEV, a multi-topic dataset collected from the Levantine countries, to test the performance of the models. We show the superiority of the proposed method in accuracy and robustness when dealing with the Arabic language.
2018
EMA at SemEval-2018 Task 1: Emotion Mining for Arabic
Gilbert Badaro
|
Obeida El Jundi
|
Alaa Khaddaj
|
Alaa Maarouf
|
Raslan Kain
|
Hazem Hajj
|
Wassim El-Hajj
Proceedings of the 12th International Workshop on Semantic Evaluation
While significant progress has been achieved for Opinion Mining in Arabic (OMA), very limited efforts have been put towards the task of Emotion mining in Arabic. In fact, businesses are interested in learning a fine-grained representation of how users are feeling towards their products or services. In this work, we describe the methods used by the team Emotion Mining in Arabic (EMA), as part of the SemEval-2018 Task 1 for Affect Mining for Arabic tweets. EMA participated in all 5 subtasks. For the five tasks, several preprocessing steps were evaluated and eventually the best system included diacritics removal, elongation adjustment, replacement of emojis by the corresponding Arabic word, character normalization and light stemming. Moreover, several features were evaluated along with different classification and regression techniques. For the 5 subtasks, word embeddings feature turned out to perform best along with Ensemble technique. EMA achieved the 1st place in subtask 5, and 3rd place in subtasks 1 and 3.
Search
Fix data
Co-authors
- Wassim El-Hajj 2
- Hazem Hajj 2
- Gilbert Badaro 1
- Obeida El Jundi 1
- Raslan Kain 1
- show all...