BERT-based Multi-Task Model for Country and Province Level MSA and Dialectal Arabic Identification

Abdellah El Mekki, Abdelkader El Mahdaouy, Kabil Essefar, Nabil El Mamoun, Ismail Berrada, Ahmed Khoumsi


Abstract
Dialect and standard language identification are crucial tasks for many Arabic natural language processing applications. In this paper, we present our deep learning-based system, submitted to the second NADI shared task for country-level and province-level identification of Modern Standard Arabic (MSA) and Dialectal Arabic (DA). The system is based on an end-to-end deep Multi-Task Learning (MTL) model to tackle both country-level and province-level MSA/DA identification. The latter MTL model consists of a shared Bidirectional Encoder Representation Transformers (BERT) encoder, two task-specific attention layers, and two classifiers. Our key idea is to leverage both the task-discriminative and the inter-task shared features for country and province MSA/DA identification. The obtained results show that our MTL model outperforms single-task models on most subtasks.
Anthology ID:
2021.wanlp-1.31
Volume:
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Month:
April
Year:
2021
Address:
Kyiv, Ukraine (Virtual)
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
271–275
Language:
URL:
https://aclanthology.org/2021.wanlp-1.31
DOI:
Bibkey:
Cite (ACL):
Abdellah El Mekki, Abdelkader El Mahdaouy, Kabil Essefar, Nabil El Mamoun, Ismail Berrada, and Ahmed Khoumsi. 2021. BERT-based Multi-Task Model for Country and Province Level MSA and Dialectal Arabic Identification. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 271–275, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
Cite (Informal):
BERT-based Multi-Task Model for Country and Province Level MSA and Dialectal Arabic Identification (El Mekki et al., WANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wanlp-1.31.pdf