IRUEX: A Study on Large Language Models Problem-Solving Skills in Iran’s University Entrance Exam

Hamed Khademi Khaledi; Heshaam Faili

IRUEX: A Study on Large Language Models Problem-Solving Skills in Iran’s University Entrance Exam

Abstract

In this paper, we present the IRUEX dataset, a novel multiple-choice educational resource specifically designed to evaluate the performance of Large Language Models (LLMs) across seven distinct categories. The dataset contains 868 Iran university entrance exam questions (Konkour) and 36,485 additional questions. Each additional question is accompanied by detailed solutions, and the dataset also includes relevant high school textbooks, providing comprehensive study material. A key feature of IRUEX is its focus on underrepresented languages, particularly assessing problem-solving skills, language proficiency, and reasoning. Our evaluation shows that GPT-4o outperforms the other LLMs tested on the IRUEX dataset. Techniques such as few-shot learning and retrieval-augmented generation (RAG) display varied effects across different categories, highlighting their unique strengths in specific areas. Additionally, a comprehensive user study classifies the errors made by LLMs into ten problem-solving ability categories. The analysis highlights that calculations and linguistic knowledge, particularly in low-resource languages, remain significant weaknesses in current LLMs. IRUEX has the potential to serve as a benchmark for evaluating the reasoning capabilities of LLMs in non-English settings, providing a foundation for improving their performance in diverse languages and contexts

Anthology ID:: 2025.coling-main.434
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6505–6519
Language:
URL:: https://aclanthology.org/2025.coling-main.434/
DOI:
Bibkey:
Cite (ACL):: Hamed Khademi Khaledi and Heshaam Faili. 2025. IRUEX: A Study on Large Language Models Problem-Solving Skills in Iran’s University Entrance Exam. In Proceedings of the 31st International Conference on Computational Linguistics, pages 6505–6519, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: IRUEX: A Study on Large Language Models Problem-Solving Skills in Iran’s University Entrance Exam (Khademi Khaledi & Faili, COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.434.pdf

PDF Cite Search Fix data