Birzeit Arabic Dialect Identification System for the 2018 VarDial Challenge

Rabee Naser, Abualsoud Hanani


Abstract
This paper describes our Automatic Dialect Recognition (ADI) system for the VarDial 2018 challenge, with the goal of distinguishing four major Arabic dialects, as well as Modern Standard Arabic (MSA). The training and development ADI VarDial 2018 data consists of 16,157 utterances, their words transcription, their phonetic transcriptions obtained with four non-Arabic phoneme recognizers and acoustic embedding data. Our overall system is a combination of four different systems. One system uses the words transcriptions and tries to recognize the speaker dialect by modeling the sequence of words for each dialect. Another system tries to recognize the dialect by modeling the phones sequence produced by non-Arabic phone recognizers, whereas, the other two systems use GMM trained on the acoustic features for recognizing the dialect. The best performance was achieved by the fused system which combines four systems together, with F1 micro of 68.77%.
Anthology ID:
W18-3924
Volume:
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
210–217
Language:
URL:
https://aclanthology.org/W18-3924/
DOI:
Bibkey:
Cite (ACL):
Rabee Naser and Abualsoud Hanani. 2018. Birzeit Arabic Dialect Identification System for the 2018 VarDial Challenge. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 210–217, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Birzeit Arabic Dialect Identification System for the 2018 VarDial Challenge (Naser & Hanani, VarDial 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-3924.pdf