CMMLU: Measuring massive multitask language understanding in Chinese

Haonan Li; Yixuan Zhang; Fajri Koto; Yifei Yang; Hai Zhao; Yeyun Gong; Nan Duan; Timothy Baldwin

CMMLU: Measuring massive multitask language understanding in Chinese

Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, Timothy Baldwin

Abstract

As the capabilities of large language models (LLMs) continue to advance, evaluating their performance is becoming more important and more challenging. This paper aims to address this issue for Mandarin Chinese in the form of CMMLU, a comprehensive Chinese benchmark that covers various subjects, including natural sciences, social sciences, engineering, and the humanities. We conduct a thorough evaluation of more than 20 contemporary multilingual and Chinese LLMs, assessing their performance across different subjects and settings. The results reveal that most existing LLMs struggle to achieve an accuracy of even 60%, which is the pass mark for Chinese exams. This highlights that there is substantial room for improvement in the capabilities of LLMs. Additionally, we conduct extensive experiments to identify factors impacting the models’ performance and propose directions for enhancing LLMs. CMMLU fills the gap in evaluating the knowledge and reasoning capabilities of large language models for Chinese.

Anthology ID:: 2024.findings-acl.671
Volume:: Findings of the Association for Computational Linguistics ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand and virtual meeting
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11260–11285
Language:
URL:: https://aclanthology.org/2024.findings-acl.671
DOI:
Bibkey:
Cite (ACL):: Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, and Timothy Baldwin. 2024. CMMLU: Measuring massive multitask language understanding in Chinese. In Findings of the Association for Computational Linguistics ACL 2024, pages 11260–11285, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
Cite (Informal):: CMMLU: Measuring massive multitask language understanding in Chinese (Li et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.671.pdf

PDF Cite Search