Classifying ASR Transcriptions According to Arabic Dialect

Abualsoud Hanani, Aziz Qaroush, Stephen Taylor


Abstract
We describe several systems for identifying short samples of Arabic dialects. The systems were prepared for the shared task of the 2016 DSL Workshop. Our best system, an SVM using character tri-gram features, achieved an accuracy on the test data for the task of 0.4279, compared to a baseline of 0.20 for chance guesses or 0.2279 if we had always chosen the same most frequent class in the test set. This compares with the results of the team with the best weighted F1 score, which was an accuracy of 0.5117. The team entries seem to fall into cohorts, with all the teams in a cohort within a standard-deviation of each other, and our three entries are in the third cohort, which is about seven standard deviations from the top.
Anthology ID:
W16-4817
Volume:
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi
Venue:
VarDial
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
126–134
Language:
URL:
https://aclanthology.org/W16-4817/
DOI:
Bibkey:
Cite (ACL):
Abualsoud Hanani, Aziz Qaroush, and Stephen Taylor. 2016. Classifying ASR Transcriptions According to Arabic Dialect. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pages 126–134, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Classifying ASR Transcriptions According to Arabic Dialect (Hanani et al., VarDial 2016)
Copy Citation:
PDF:
https://aclanthology.org/W16-4817.pdf