Riddle Me This: Evaluating Large Language Models in Solving Word-Based Games

Raffaele Manna, Maria Pia di Buono, Johanna Monti


Abstract
In this contribution, we examine the proficiency of Large Language Models (LLMs) in solving the linguistic game “La Ghigliottina,” the final game of the popular Italian TV quiz show “L’Eredità”. This game is particularly challenging as it requires LLMs to engage in semantic inference reasoning for identifying the solutions of the game. Our experiment draws inspiration from Ghigliottin-AI, a task of EVALITA 2020, an evaluation campaign focusing on Natural Language Processing (NLP) and speech tools designed for the Italian language. To benchmark our experiment, we use the results of the most successful artificial player in this task, namely Il Mago della Ghigliottina. The paper describes the experimental setting and the results which show that LLMs perform poorly.
Anthology ID:
2024.games-1.11
Volume:
Proceedings of the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Chris Madge, Jon Chamberlain, Karen Fort, Udo Kruschwitz, Stephanie Lukin
Venues:
games | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
97–106
Language:
URL:
https://aclanthology.org/2024.games-1.11
DOI:
Bibkey:
Cite (ACL):
Raffaele Manna, Maria Pia di Buono, and Johanna Monti. 2024. Riddle Me This: Evaluating Large Language Models in Solving Word-Based Games. In Proceedings of the 10th Workshop on Games and Natural Language Processing @ LREC-COLING 2024, pages 97–106, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Riddle Me This: Evaluating Large Language Models in Solving Word-Based Games (Manna et al., games-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.games-1.11.pdf