Hyperparameter Power Impact in Transformer Language Model Training

Lucas Høyberg Puvis de Chavannes, Mads Guldborg Kjeldgaard Kongsbak, Timmie Rantzau, Leon Derczynski


Abstract
Training large language models can consume a large amount of energy. We hypothesize that the language model’s configuration impacts its energy consumption, and that there is room for power consumption optimisation in modern large language models. To investigate these claims, we introduce a power consumption factor to the objective function, and explore the range of models and hyperparameter configurations that affect power. We identify multiple configuration factors that can reduce power consumption during language model training while retaining model quality.
Anthology ID:
2021.sustainlp-1.12
Volume:
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing
Month:
November
Year:
2021
Address:
Virtual
Editors:
Nafise Sadat Moosavi, Iryna Gurevych, Angela Fan, Thomas Wolf, Yufang Hou, Ana Marasović, Sujith Ravi
Venue:
sustainlp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
96–118
Language:
URL:
https://aclanthology.org/2021.sustainlp-1.12
DOI:
10.18653/v1/2021.sustainlp-1.12
Bibkey:
Cite (ACL):
Lucas Høyberg Puvis de Chavannes, Mads Guldborg Kjeldgaard Kongsbak, Timmie Rantzau, and Leon Derczynski. 2021. Hyperparameter Power Impact in Transformer Language Model Training. In Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, pages 96–118, Virtual. Association for Computational Linguistics.
Cite (Informal):
Hyperparameter Power Impact in Transformer Language Model Training (Puvis de Chavannes et al., sustainlp 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.sustainlp-1.12.pdf
Video:
 https://aclanthology.org/2021.sustainlp-1.12.mp4
Code
 StrombergNLP/Low-Carbon-NLP