Zebulon Goriely
2024
Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing
Richard Diehl Martinez
|
Zebulon Goriely
|
Andrew Caines
|
Paula Buttery
|
Lisa Beinborn
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Language models strongly rely on frequency information because they maximize the likelihood of tokens during pre-training. As a consequence, language models tend to not generalize well to tokens that are seldom seen during training. Moreover, maximum likelihood training has been discovered to give rise to anisotropy: representations of tokens in a model tend to cluster tightly in a high-dimensional cone, rather than spreading out over their representational capacity.Our work introduces a method for quantifying the frequency bias of a language model by assessing sentence-level perplexity with respect to token-level frequency. We then present a method for reducing the frequency bias of a language model by inducing a syntactic prior over token representations during pre-training. Our Syntactic Smoothing method adjusts the maximum likelihood objective function to distribute the learning signal to syntactically similar tokens. This approach results in better performance on infrequent English tokens and a decrease in anisotropy. We empirically show that the degree of anisotropy in a model correlates with its frequency bias.
2023
CLIMB – Curriculum Learning for Infant-inspired Model Building
Richard Diehl Martinez
|
Hope McGovern
|
Zebulon Goriely
|
Christopher Davis
|
Andrew Caines
|
Paula Buttery
|
Lisa Beinborn
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning
Search
Co-authors
- Richard Diehl Martinez 2
- Andrew Caines 2
- Paula Buttery 2
- Lisa Beinborn 2
- Hope McGovern 1
- show all...