Scaling Laws for BERT in Low-Resource Settings

Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa


Abstract
Large language models are very resource intensive, both financially and environmentally, and require an amount of training data which is simply unobtainable for the majority of NLP practitioners. Previous work has researched the scaling laws of such models, but optimal ratios of model parameters, dataset size, and computation costs focused on the large scale. In contrast, we analyze the effect those variables have on the performance of language models in constrained settings, by building three lightweight BERT models (16M/51M/124M parameters) trained over a set of small corpora (5M/25M/125M words).We experiment on four languages of different linguistic characteristics (Basque, Spanish, Swahili and Finnish), and evaluate the models on MLM and several NLU tasks. We conclude that the power laws for parameters, data and compute for low-resource settings differ from the optimal scaling laws previously inferred, and data requirements should be higher. Our insights are consistent across all the languages we study, as well as across the MLM and downstream tasks. Furthermore, we experimentally establish when the cost of using a Transformer-based approach is worth taking, instead of favouring other computationally lighter solutions.
Anthology ID:
2023.findings-acl.492
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7771–7789
Language:
URL:
https://aclanthology.org/2023.findings-acl.492
DOI:
10.18653/v1/2023.findings-acl.492
Bibkey:
Cite (ACL):
Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, and Aitor Soroa. 2023. Scaling Laws for BERT in Low-Resource Settings. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7771–7789, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Scaling Laws for BERT in Low-Resource Settings (Urbizu et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.492.pdf