Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages?

Luca Moroni; Javier Aula-Blasco; Simone Conia; Irene Baucells; Naiara Pérez; Silvia Paniagua Suárez; Anna Sallés; Malte Ostendorff; Júlia Falcão; Guijin Son; Aitor González-Agirre; Roberto Navigli; Marta Villegas

doi:10.18653/v1/2025.emnlp-main.1731

Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages?

Luca Moroni, Javier Aula-Blasco, Simone Conia, Irene Baucells, Naiara Perez, Silvia Paniagua Suárez, Anna Sallés, Malte Ostendorff, Júlia Falcão, Guijin Son, Aitor Gonzalez-Agirre, Roberto Navigli, Marta Villegas

Abstract

As large language models (LLMs) continue to improve, their evaluation increasingly centers on complex, high-level tasks, often at the expense of systematically assessing fundamental capabilities. To address this gap, recent work proposed LMentry, a compact benchmark comprising tasks that are trivial for humans but remain surprisingly difficult for LLMs. However, LMentry is limited to English, leaving its insights linguistically narrow. In this paper, we present Multi-LMentry, a ground-up recreation of LMentry that enables systematic evaluation of LLMs on basic reasoning and understanding tasks across nine diverse languages. Multi-LMentry includes English and expands to Basque, Brazilian Portuguese, Catalan, Galician, German, Italian, Korean, and Spanish, emphasizing the importance of cross-lingual and low-resource settings. To validate that Multi-LMentry is still trivial for humans, we demonstrate that L2 speakers with only elementary proficiency achieve near-perfect scores in a low-resource language, namely, Basque. Through extensive experiments, we reveal that state-of-the-art open-weight multilingual LLMs still fall short of human performance on elementary tasks in many languages. Our results expose new failure modes that remain hidden in monolingual evaluation, underscoring the need for rigorous, language-diverse “unit tests” of core model abilities.

Anthology ID:: 2025.emnlp-main.1731
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34126–34157
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1731/
DOI:: 10.18653/v1/2025.emnlp-main.1731
Bibkey:
Cite (ACL):: Luca Moroni, Javier Aula-Blasco, Simone Conia, Irene Baucells, Naiara Perez, Silvia Paniagua Suárez, Anna Sallés, Malte Ostendorff, Júlia Falcão, Guijin Son, Aitor Gonzalez-Agirre, Roberto Navigli, and Marta Villegas. 2025. Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages?. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34126–34157, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages? (Moroni et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1731.pdf
Checklist:: 2025.emnlp-main.1731.checklist.pdf

PDF Cite Search Checklist Fix data