Critical Size Hypothesis: How Model Hyperparameters Correlate with Its Linguistic Abilities

Ekaterina Voloshina, Oleg Serikov


Abstract
In recent years, the models were tested on different probing tasks to examine their language knowledge. However, few researchers explored the very process of models’ language acquisition. Nevertheless, the analysis of language acquisition during training could shed light on the model parameters that help to acquire the language faster. In this work, we experiment with model hyperparameters and reveal that the hidden size is the most essential factor for model language acquisition.
Anthology ID:
2024.clasp-1.1
Volume:
Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning
Month:
October
Year:
2024
Address:
Gothenburg, Sweden
Editors:
Amy Qiu, Bill Noble, David Pagmar, Vladislav Maraev, Nikolai Ilinykh
Venue:
CLASP
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–7
Language:
URL:
https://aclanthology.org/2024.clasp-1.1
DOI:
Bibkey:
Cite (ACL):
Ekaterina Voloshina and Oleg Serikov. 2024. Critical Size Hypothesis: How Model Hyperparameters Correlate with Its Linguistic Abilities. In Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning, pages 1–7, Gothenburg, Sweden. Association for Computational Linguistics.
Cite (Informal):
Critical Size Hypothesis: How Model Hyperparameters Correlate with Its Linguistic Abilities (Voloshina & Serikov, CLASP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clasp-1.1.pdf