Snoopy: An Online Interface for Exploring the Effect of Pretraining Term Frequencies on Few-Shot LM Performance

Yasaman Razeghi, Raja Sekhar Reddy Mekala, Robert L Logan Iv, Matt Gardner, Sameer Singh


Abstract
Current evaluation schemes for large language models often fail to consider the impact of the overlap between pretraining corpus and test data on model performance statistics. Snoopy is an online interface that allows researchers to study this impact in few-shot learning settings. Our demo provides term frequency statistics for the Pile, which is an 800 GB corpus, accompanied by the precomputed performance of EleutherAI/GPT models on more than 20 NLP benchmarks, including numerical, commonsense reasoning, natural language understanding, and question-answering tasks. Snoopy allows a user to interactively align specific terms in test instances with their frequency in the Pile, enabling exploratory analysis of how term frequency is related to the accuracy of the models, which are hard to discover through automated means. A user can look at correlations over various model sizes and numbers of in-context examples and visualize the result across multiple (potentially aggregated) datasets. Using Snoopy, we show that a researcher can quickly replicate prior analyses for numerical tasks, while simultaneously allowing for much more expansive exploration that was previously challenging. Snoopy is available at https://nlp.ics.uci.edu/snoopy.
Anthology ID:
2022.emnlp-demos.39
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:
December
Year:
2022
Address:
Abu Dhabi, UAE
Editors:
Wanxiang Che, Ekaterina Shutova
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
389–395
Language:
URL:
https://aclanthology.org/2022.emnlp-demos.39
DOI:
10.18653/v1/2022.emnlp-demos.39
Bibkey:
Cite (ACL):
Yasaman Razeghi, Raja Sekhar Reddy Mekala, Robert L Logan Iv, Matt Gardner, and Sameer Singh. 2022. Snoopy: An Online Interface for Exploring the Effect of Pretraining Term Frequencies on Few-Shot LM Performance. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 389–395, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):
Snoopy: An Online Interface for Exploring the Effect of Pretraining Term Frequencies on Few-Shot LM Performance (Razeghi et al., EMNLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.emnlp-demos.39.pdf