Contextual Embeddings for Arabic-English Code-Switched Data

Caroline Sabty; Mohamed Islam; Slim Abdennadher

Contextual Embeddings for Arabic-English Code-Switched Data

Caroline Sabty, Mohamed Islam, Slim Abdennadher

Abstract

Globalization has caused the rise of the code-switching phenomenon among multilingual societies. In Arab countries, code-switching between Arabic and English has become frequent, especially through social media platforms. Consequently, research in Natural Language Processing (NLP) systems increased to tackle such a phenomenon. One of the significant challenges of developing code-switched NLP systems is the lack of data itself. In this paper, we propose an open source trained bilingual contextual word embedding models of FLAIR, BERT, and ELECTRA. We also propose a novel contextual word embedding model called KERMIT, which can efficiently map Arabic and English words inside one vector space in terms of data usage. We applied intrinsic and extrinsic evaluation methods to compare the performance of the models. Our results show that FLAIR and FastText achieve the highest results in the sentiment analysis task. However, KERMIT is the best-achieving model on the intrinsic evaluation and named entity recognition. Also, it outperforms the other transformer-based models on question answering task.

Anthology ID:: 2020.wanlp-1.20
Volume:: Proceedings of the Fifth Arabic Natural Language Processing Workshop
Month:: December
Year:: 2020
Address:: Barcelona, Spain (Online)
Editors:: Imed Zitouni, Muhammad Abdul-Mageed, Houda Bouamor, Fethi Bougares, Mahmoud El-Haj, Nadi Tomeh, Wajdi Zaghouani
Venue:: WANLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 215–225
Language:
URL:: https://aclanthology.org/2020.wanlp-1.20
DOI:
Bibkey:
Cite (ACL):: Caroline Sabty, Mohamed Islam, and Slim Abdennadher. 2020. Contextual Embeddings for Arabic-English Code-Switched Data. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 215–225, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):: Contextual Embeddings for Arabic-English Code-Switched Data (Sabty et al., WANLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.wanlp-1.20.pdf
Code: csabty/code-switch-arabic-english-contextual-embeddings

PDF Cite Search Code