Memorisation versus Generalisation in Pre-trained Language Models

Michael Tänzer, Sebastian Ruder, Marek Rei


Abstract
State-of-the-art pre-trained language models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal results even on extremely noisy datasets. However, our experiments also show that they mainly learn from high-frequency patterns and largely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition. To mitigate such limitations, we propose an extension based on prototypical networks that improves performance in low-resource named entity recognition tasks.
Anthology ID:
2022.acl-long.521
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7564–7578
Language:
URL:
https://aclanthology.org/2022.acl-long.521
DOI:
10.18653/v1/2022.acl-long.521
Bibkey:
Cite (ACL):
Michael Tänzer, Sebastian Ruder, and Marek Rei. 2022. Memorisation versus Generalisation in Pre-trained Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7564–7578, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Memorisation versus Generalisation in Pre-trained Language Models (Tänzer et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.521.pdf
Video:
 https://aclanthology.org/2022.acl-long.521.mp4
Code
 Michael-Tanzer/BERT-mem-lowres
Data
CIFAR-10CoNLL 2003CoNLL++WNUT 2017