Leroy Zhifei Wang

2023

GQG: Generalized Quantifier Generalization - A Dataset for Evaluating Quantifier Semantics Understanding in Language Models
Leroy Zhifei Wang | Shane Steinert-Threlkeld
Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP

We present a new dataset consisting of various quantifier expressions to evaluate the generalization abilities of language models. The dataset contains 18,360 prompts encompassing diverse quantifiers, forming the basis of a new framework for assessing semantic understanding in this domain. We test the effectiveness of our dataset using Pythia models, ranging from 410 million to 6.9 billion, showing that quantifier-based tasks can be challenging for current language models. We make our code and data publicly available, such that the dataset can be easily extended or updated based on different evaluation needs.

Co-authors

Shane Steinert-Threlkeld 1

Venues

GenBench1
WS1

Fix author