Arabic Dialect Identification Using iVectors and ASR Transcripts

Shervin Malmasi, Marcos Zampieri


Abstract
This paper presents the systems submitted by the MAZA team to the Arabic Dialect Identification (ADI) shared task at the VarDial Evaluation Campaign 2017. The goal of the task is to evaluate computational models to identify the dialect of Arabic utterances using both audio and text transcriptions. The ADI shared task dataset included Modern Standard Arabic (MSA) and four Arabic dialects: Egyptian, Gulf, Levantine, and North-African. The three systems submitted by MAZA are based on combinations of multiple machine learning classifiers arranged as (1) voting ensemble; (2) mean probability ensemble; (3) meta-classifier. The best results were obtained by the meta-classifier achieving 71.7% accuracy, ranking second among the six teams which participated in the ADI shared task.
Anthology ID:
W17-1222
Volume:
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Preslav Nakov, Marcos Zampieri, Nikola Ljubešić, Jörg Tiedemann, Shevin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
178–183
Language:
URL:
https://aclanthology.org/W17-1222
DOI:
10.18653/v1/W17-1222
Bibkey:
Cite (ACL):
Shervin Malmasi and Marcos Zampieri. 2017. Arabic Dialect Identification Using iVectors and ASR Transcripts. In Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), pages 178–183, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Arabic Dialect Identification Using iVectors and ASR Transcripts (Malmasi & Zampieri, VarDial 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-1222.pdf