JHU System Description for the MADAR Arabic Dialect Identification Shared Task

Tom Lippincott, Pamela Shapiro, Kevin Duh, Paul McNamee


Abstract
Our submission to the MADAR shared task on Arabic dialect identification employed a language modeling technique called Prediction by Partial Matching, an ensemble of neural architectures, and sources of additional data for training word embeddings and auxiliary language models. We found several of these techniques provided small boosts in performance, though a simple character-level language model was a strong baseline, and a lower-order LM achieved best performance on Subtask 2. Interestingly, word embeddings provided no consistent benefit, and ensembling struggled to outperform the best component submodel. This suggests the variety of architectures are learning redundant information, and future work may focus on encouraging decorrelated learning.
Anthology ID:
W19-4634
Volume:
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
264–268
Language:
URL:
https://aclanthology.org/W19-4634
DOI:
10.18653/v1/W19-4634
Bibkey:
Cite (ACL):
Tom Lippincott, Pamela Shapiro, Kevin Duh, and Paul McNamee. 2019. JHU System Description for the MADAR Arabic Dialect Identification Shared Task. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 264–268, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
JHU System Description for the MADAR Arabic Dialect Identification Shared Task (Lippincott et al., WANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-4634.pdf