AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Asaad Alghamdi, Xinyu Duan, Wei Jiang, Zhenhai Wang, Yimeng Wu, Qingrong Xia, Zhefeng Wang, Yi Zheng, Mehdi Rezagholizadeh, Baoxing Huai, Peilun Cheng, Abbas Ghaddar


Abstract
Developing monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP). In this work, we present AraMUS, the largest Arabic PLM with 11B parameters trained on 529GB of high-quality Arabic textual data. AraMUS achieves state-of-the-art performances on a diverse set of Arabic classification and generative tasks. Moreover, AraMUS shows impressive few-shot learning abilities compared with the best existing Arabic PLMs.
Anthology ID:
2023.findings-acl.181
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2883–2894
Language:
URL:
https://aclanthology.org/2023.findings-acl.181
DOI:
10.18653/v1/2023.findings-acl.181
Bibkey:
Cite (ACL):
Asaad Alghamdi, Xinyu Duan, Wei Jiang, Zhenhai Wang, Yimeng Wu, Qingrong Xia, Zhefeng Wang, Yi Zheng, Mehdi Rezagholizadeh, Baoxing Huai, Peilun Cheng, and Abbas Ghaddar. 2023. AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2883–2894, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing (Alghamdi et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.181.pdf
Video:
 https://aclanthology.org/2023.findings-acl.181.mp4