Decoding Stumpers: Large Language Models vs. Human Problem-Solvers

Alon Goldstein, Miriam Havin, Roi Reichart, Ariel Goldstein


Abstract
This paper investigates the problem-solving capabilities of Large Language Models (LLMs) by evaluating their performance on stumpers, unique single-step intuition problems that pose challenges for human solvers but are easily verifiable. We compare the performance of four state-of-the-art LLMs (Davinci-2, Davinci-3, GPT-3.5-Turbo, GPT-4) to human participants. Our findings reveal that the new-generation LLMs excel in solving stumpers and surpass human performance. However, humans exhibit superior skills in verifying solutions to the same problems. This research enhances our understanding of LLMs’ cognitive abilities and provides insights for enhancing their problem-solving potential across various domains.
Anthology ID:
2023.findings-emnlp.779
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11644–11653
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.779
DOI:
10.18653/v1/2023.findings-emnlp.779
Bibkey:
Cite (ACL):
Alon Goldstein, Miriam Havin, Roi Reichart, and Ariel Goldstein. 2023. Decoding Stumpers: Large Language Models vs. Human Problem-Solvers. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11644–11653, Singapore. Association for Computational Linguistics.
Cite (Informal):
Decoding Stumpers: Large Language Models vs. Human Problem-Solvers (Goldstein et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.779.pdf