Nitin Nikamanth Appiah Balaji
2020
Semi-supervised Fine-grained Approach for Arabic dialect detection task
Nitin Nikamanth Appiah Balaji
|
Bharathi B
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Arabic being a language with numerous different dialects, it becomes extremely important to device a technique to distinguish each dialect efficiently. This paper focuses on the fine-grained country level and province level classification of Arabic dialects. The experiments in this paper are submissions done to the NADI 2020 shared Dialect detection task. Various text feature extraction techniques such as TF-IDF, AraVec, multilingual BERT and Fasttext embedding models are studied. We thereby, propose an approach of text embedding based model with macro average F1 score of 0.2232 for task1 and 0.0483 for task2, with the help of semi supervised learning approach.