Non Verbis, Sed Rebus: Large Language Models Are Weak Solvers of Italian Rebuses

Gabriele Sarti, Tommaso Caselli, Malvina Nissim, Arianna Bisazza


Abstract
Rebuses are puzzles requiring constrained multi-step reasoning to identify a hidden phrase from a set of images and letters. In this work, we introduce a large collection of verbalized rebuses for the Italian language and use it to assess the rebus-solving capabilities of state-of-the-art large language models. While general-purpose systems such as LLaMA-3 and GPT-4o perform poorly on this task, ad-hoc fine-tuning seems to improve models’ performance. However, we find that performance gains from training are largely motivated by memorization. Our results suggest that rebus solving remains a challenging test bed to evaluate large language models’ linguistic proficiency and sequential instruction-following skills.
Anthology ID:
2024.clicit-1.96
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
888–897
Language:
URL:
https://aclanthology.org/2024.clicit-1.96/
DOI:
Bibkey:
Cite (ACL):
Gabriele Sarti, Tommaso Caselli, Malvina Nissim, and Arianna Bisazza. 2024. Non Verbis, Sed Rebus: Large Language Models Are Weak Solvers of Italian Rebuses. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 888–897, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Non Verbis, Sed Rebus: Large Language Models Are Weak Solvers of Italian Rebuses (Sarti et al., CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.96.pdf