Abualsoud Hanani
2018
Birzeit Arabic Dialect Identification System for the 2018 VarDial Challenge
Rabee Naser
|
Abualsoud Hanani
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
This paper describes our Automatic Dialect Recognition (ADI) system for the VarDial 2018 challenge, with the goal of distinguishing four major Arabic dialects, as well as Modern Standard Arabic (MSA). The training and development ADI VarDial 2018 data consists of 16,157 utterances, their words transcription, their phonetic transcriptions obtained with four non-Arabic phoneme recognizers and acoustic embedding data. Our overall system is a combination of four different systems. One system uses the words transcriptions and tries to recognize the speaker dialect by modeling the sequence of words for each dialect. Another system tries to recognize the dialect by modeling the phones sequence produced by non-Arabic phone recognizers, whereas, the other two systems use GMM trained on the acoustic features for recognizing the dialect. The best performance was achieved by the fused system which combines four systems together, with F1 micro of 68.77%.
2017
Identifying dialects with textual and acoustic cues
Abualsoud Hanani
|
Aziz Qaroush
|
Stephen Taylor
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
We describe several systems for identifying short samples of Arabic or Swiss-German dialects, which were prepared for the shared task of the 2017 DSL Workshop (Zampieri et al., 2017). The Arabic data comprises both text and acoustic files, and our best run combined both. The Swiss-German data is text-only. Coincidently, our best runs achieved a accuracy of nearly 63% on both the Swiss-German and Arabic dialects tasks.
2016
Classifying ASR Transcriptions According to Arabic Dialect
Abualsoud Hanani
|
Aziz Qaroush
|
Stephen Taylor
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
We describe several systems for identifying short samples of Arabic dialects. The systems were prepared for the shared task of the 2016 DSL Workshop. Our best system, an SVM using character tri-gram features, achieved an accuracy on the test data for the task of 0.4279, compared to a baseline of 0.20 for chance guesses or 0.2279 if we had always chosen the same most frequent class in the test set. This compares with the results of the team with the best weighted F1 score, which was an accuracy of 0.5117. The team entries seem to fall into cohorts, with all the teams in a cohort within a standard-deviation of each other, and our three entries are in the third cohort, which is about seven standard deviations from the top.