This paper describes Faheem (adj. of understand), our submission to NADI (Nuanced Arabic Dialect Identification) shared task. With so many Arabic dialects being under-studied due to the scarcity of the resources, the objective is to identify the Arabic dialect used in the tweet, country wise. We propose a machine learning approach where we utilize word-level n-gram (n = 1 to 3) and tf-idf features and feed them to six different classifiers. We train the system using a data set of 21,000 tweets—provided by the organizers—covering twenty-one Arab countries. Our top performing classifiers are: Logistic Regression, Support Vector Machines, and Multinomial Na ̈ıve Bayes.
Arib@QALB-2015 Shared Task: A Hybrid Cascade Model for Arabic Spelling Error Detection and Correction
Nouf AlShenaifi | Rehab AlNefie | Maha Al-Yahya | Hend Al-Khalifa
Proceedings of the Second Workshop on Arabic Natural Language Processing