GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sidney Black; Stella Biderman; Eric Hallahan; Quentin Anthony; Leo Gao; Laurence Golding; Horace He; Connor Leahy; Kyle McDonell; Jason Phang; Michael Pieler; Usvsn Sai Prashanth; Shivanshu Purohit; Laria Reynolds; Jonathan Tow; Ben Wang; Samuel Weinbach

doi:10.18653/v1/2022.bigscience-1.9

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sidney Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, Usvsn Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach

Abstract

We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe GPT-NeoX-20B’s architecture and training, and evaluate its performance. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.

Anthology ID:: 2022.bigscience-1.9
Volume:: Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models
Month:: May
Year:: 2022
Address:: virtual+Dublin
Editors:: Angela Fan, Suzana Ilic, Thomas Wolf, Matthias Gallé
Venue:: BigScience
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 95–136
Language:
URL:: https://aclanthology.org/2022.bigscience-1.9
DOI:: 10.18653/v1/2022.bigscience-1.9
Bibkey:
Cite (ACL):: Sidney Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, Usvsn Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. 2022. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. In Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models, pages 95–136, virtual+Dublin. Association for Computational Linguistics.
Cite (Informal):: GPT-NeoX-20B: An Open-Source Autoregressive Language Model (Black et al., BigScience 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.bigscience-1.9.pdf
Code: eleutherai/gpt-neox + additional community code
Data: HellaSwag, LAMBADA, LogiQA, MATH, MMLU, PIQA, PROST, The Pile

PDF Cite Search Code