No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects

Chiyu Zhang, Muhammad Abdul-Mageed


Abstract
We present our deep leaning system submitted to MADAR shared task 2 focused on twitter user dialect identification. We develop tweet-level identification models based on GRUs and BERT in supervised and semi-supervised set-tings. We then introduce a simple, yet effective, method of porting tweet-level labels at the level of users. Our system ranks top 1 in the competition, with 71.70% macro F1 score and 77.40% accuracy.
Anthology ID:
W19-4637
Volume:
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
279–284
Language:
URL:
https://aclanthology.org/W19-4637
DOI:
10.18653/v1/W19-4637
Bibkey:
Cite (ACL):
Chiyu Zhang and Muhammad Abdul-Mageed. 2019. No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 279–284, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects (Zhang & Abdul-Mageed, WANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4637.pdf