2024
pdf
bib
abs
Fifty shapes of BLiMP: syntactic learning curves in language models are not uniform, but sometimes unruly
Bastian Bunzeck
|
Sina Zarrieß
Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning
Syntactic learning curves in LMs are usually reported as relatively stable and power law-shaped. By analyzing the learning curves of different LMs on various syntactic phenomena using both small self-trained llama models and larger pre-trained pythia models, we show that while many phenomena do follow typical power law curves, others exhibit S-shaped, U-shaped, or erratic patterns. Certain syntactic paradigms remain challenging even for large models, resulting in persistent preference for ungrammatical sentences. Most phenomena show similar curves for their paradigms, but the existence of diverging patterns and oscillations indicates that average curves mask important developments, underscoring the need for more detailed analyses of individual learning trajectories.
pdf
bib
abs
The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns
Bastian Bunzeck
|
Sina Zarrieß
Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP
We introduce SlayQA, a novel benchmark data set designed to evaluate language models’ ability to handle gender-inclusive language, specifically the use of neopronouns, in a question-answering setting. Derived from the Social IQa data set, SlayQA modifies context-question-answer triples to include gender-neutral pronouns, creating a significant linguistic distribution shift in comparison to common pre-training corpora like C4 or Dolma. Our results show that state-of-the-art language models struggle with the challenge, exhibiting small, but noticeable performance drops when answering question containing neopronouns compared to those without.
2023
pdf
bib
abs
Entrenchment Matters: Investigating Positional and Constructional Sensitivity in Small and Large Language Models
Bastian Bunzeck
|
Sina Zarrieß
Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
The success of large language models (LMs) has also prompted a push towards smaller models, but the differences in functionality and encodings between these two types of models are not yet well understood. In this paper, we employ a perturbed masking approach to investigate differences in token influence patterns on the sequence embeddings of larger and smaller RoBERTa models. Specifically, we explore how token properties like position, length or part of speech influence their sequence embeddings. We find that there is a general tendency for sequence-final tokens to exert a higher influence. Among part-of-speech tags, nouns, numerals and punctuation marks are the most influential, with smaller deviations for individual models. These findings also align with usage-based linguistic evidence on the effect of entrenchment. Finally, we show that the relationship between data size and model size influences the variability and brittleness of these effects, hinting towards a need for holistically balanced models.
pdf
bib
GPT-wee: How Small Can a Small Language Model Really Get?
Bastian Bunzeck
|
Sina Zarrieß
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning