From KMMLU-Redux to Pro: A Professional Korean Benchmark Suite for LLM Evaluation

Seokhee Hong, Sunkyoung Kim, Guijin Son, Soyeon Kim, Yeonjung Hong, Jinsik Lee


Abstract
The development of Large Language Models (LLMs) requires robust benchmarks that encompass not only academic domains but also industrial fields to effectively evaluate their applicability in real-world scenarios. In this paper, we introduce two Korean expert-level benchmarks. KMMLU-Redux, reconstructed from the existing KMMLU consists of questions from the Korean National Technical Qualification exams, with critical errors removed to enhance reliability. KMMLU-Pro is based on Korean National Professional Licensure exams to reflect professional knowledge in Korea. Our experiments demonstrate that these benchmarks comprehensively represent industrial knowledge in Korea.
Anthology ID:
2025.findings-emnlp.1038
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19067–19096
Language:
URL:
https://aclanthology.org/2025.findings-emnlp.1038/
DOI:
Bibkey:
Cite (ACL):
Seokhee Hong, Sunkyoung Kim, Guijin Son, Soyeon Kim, Yeonjung Hong, and Jinsik Lee. 2025. From KMMLU-Redux to Pro: A Professional Korean Benchmark Suite for LLM Evaluation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19067–19096, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
From KMMLU-Redux to Pro: A Professional Korean Benchmark Suite for LLM Evaluation (Hong et al., Findings 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.findings-emnlp.1038.pdf
Checklist:
 2025.findings-emnlp.1038.checklist.pdf