Cross-lingual transfer for low-resource Arabic language understanding

Khadige Abboud, Olga Golovneva, Christopher DiPersio


Abstract
This paper explores cross-lingual transfer learning in natural language understanding (NLU), with the focus on bootstrapping Arabic from high-resource English and French languages for domain classification, intent classification, and named entity recognition tasks. We adopt a BERT-based architecture and pretrain three models using open-source Wikipedia data and large-scale commercial datasets: monolingual:Arabic, bilingual:Arabic-English, and trilingual:Arabic-English-French models. Additionally, we use off-the-shelf machine translator to translate internal data from source English language to the target Arabic language, in an effort to enhance transfer learning through translation. We conduct experiments that finetune the three models for NLU tasks and evaluate them on a large internal dataset. Despite the morphological, orthographical, and grammatical differences between Arabic and the source languages, transfer learning performance gains from source languages and through machine translation are achieved on a real-world Arabic test dataset in both a zero-shot setting and in a setting when the models are further finetuned on labeled data from the target language.
Anthology ID:
2022.wanlp-1.21
Volume:
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
225–237
Language:
URL:
https://aclanthology.org/2022.wanlp-1.21
DOI:
10.18653/v1/2022.wanlp-1.21
Bibkey:
Cite (ACL):
Khadige Abboud, Olga Golovneva, and Christopher DiPersio. 2022. Cross-lingual transfer for low-resource Arabic language understanding. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 225–237, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Cross-lingual transfer for low-resource Arabic language understanding (Abboud et al., WANLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.wanlp-1.21.pdf