Arabic Compact Language Modelling for Resource Limited Devices

Zaid Alyafeai, Irfan Ahmad


Abstract
Natural language modelling has gained a lot of interest recently. The current state-of-the-art results are achieved by first training a very large language model and then fine-tuning it on multiple tasks. However, there is little work on smaller more compact language models for resource-limited devices or applications. Not to mention, how to efficiently train such models for a low-resource language like Arabic. In this paper, we investigate how such models can be trained in a compact way for Arabic. We also show how distillation and quantization can be applied to create even smaller models. Our experiments show that our largest model which is 2x smaller than the baseline can achieve better results on multiple tasks with 2x less data for pretraining.
Anthology ID:
2021.wanlp-1.6
Volume:
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Month:
April
Year:
2021
Address:
Kyiv, Ukraine (Virtual)
Venues:
EACL | WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–59
Language:
URL:
https://aclanthology.org/2021.wanlp-1.6
DOI:
Bibkey:
Cite (ACL):
Zaid Alyafeai and Irfan Ahmad. 2021. Arabic Compact Language Modelling for Resource Limited Devices. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 53–59, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
Cite (Informal):
Arabic Compact Language Modelling for Resource Limited Devices (Alyafeai & Ahmad, WANLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wanlp-1.6.pdf
Data
ARCDASTDLABR