When the Student Becomes the Master: Learning Better and Smaller Monolingual Models from mBERT

Pranaydeep Singh, Els Lefever


Abstract
In this research, we present pilot experiments to distil monolingual models from a jointly trained model for 102 languages (mBERT). We demonstrate that it is possible for the target language to outperform the original model, even with a basic distillation setup. We evaluate our methodology for 6 languages with varying amounts of resources and language families.
Anthology ID:
2022.coling-1.391
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4434–4441
Language:
URL:
https://aclanthology.org/2022.coling-1.391
DOI:
Bibkey:
Cite (ACL):
Pranaydeep Singh and Els Lefever. 2022. When the Student Becomes the Master: Learning Better and Smaller Monolingual Models from mBERT. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4434–4441, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
When the Student Becomes the Master: Learning Better and Smaller Monolingual Models from mBERT (Singh & Lefever, COLING 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.coling-1.391.pdf