The Effect of Sampling Temperature on Problem Solving in Large Language Models

Matthew Renze


Abstract
In this research study, we empirically investigate the effect of sampling temperature on the performance of Large Language Models (LLMs) on various problem-solving tasks. We created a multiple-choice question-and-answer (MCQA) exam by randomly sampling problems from standard LLM benchmarks. Then, we used nine popular LLMs with five prompt-engineering techniques to solve the MCQA problems while increasing the sampling temperature from 0.0 to 1.6. Despite anecdotal reports to the contrary, our empirical results indicate that changes in temperature from 0.0 to 1.0 do not have a statistically significant impact on LLM performance for problem-solving tasks. In addition, these results appear to generalize across LLMs, prompt-engineering techniques, and problem domains. All code, data, and supplemental materials are available on GitHub at: https://github.com/matthewrenze/jhu-llm-temperature
Anthology ID:
2024.findings-emnlp.432
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7346–7356
Language:
URL:
https://aclanthology.org/2024.findings-emnlp.432
DOI:
Bibkey:
Cite (ACL):
Matthew Renze. 2024. The Effect of Sampling Temperature on Problem Solving in Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 7346–7356, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
The Effect of Sampling Temperature on Problem Solving in Large Language Models (Renze, Findings 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.findings-emnlp.432.pdf
Software:
 2024.findings-emnlp.432.software.zip
Data:
 2024.findings-emnlp.432.data.zip