Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Yancheng He; Shilong Li; Jiaheng Liu; Yingshui Tan; Weixun Wang; Hui Huang; Xingyuan Bu; Hangyu Guo; Chengwei Hu; Boren Zheng; Zhuoran Lin; Dekai Sun; Zhicheng Zheng; Wenbo Su; Bo Zheng

doi:10.18653/v1/2025.acl-long.941

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Yancheng He, Shilong Li, Jiaheng Liu, Yingshui Tan, Weixun Wang, Hui Huang, Xingyuan Bu, Hangyu Guo, Chengwei Hu, Boren Zheng, Zhuoran Lin, Dekai Sun, Zhicheng Zheng, Wenbo Su, Bo Zheng

Abstract

New LLM benchmarks are important to align with the rapid development of Large Language Models (LLMs). In this work, we present Chinese SimpleQA, the first comprehensive Chinese benchmark to evaluate the factuality ability of LLMs to answer short questions, and Chinese SimpleQA mainly has five properties (i.e., Chinese, Diverse, High-quality, Static, Easy-to-evaluate). Specifically, first, we focus on the Chinese language over 6 major topics with 99 diverse subtopics. Second, we conduct a comprehensive quality control process to achieve high-quality questions and answers, where the reference answers are static and cannot be changed over time. Third, following SimpleQA, the questions and answers are very short, and the grading process is easy-to-evaluate. Based on Chinese SimpleQA, we perform a comprehensive evaluation of the factuality abilities of existing LLMs. Finally, we hope that Chinese SimpleQA could guide the developers to better understand the Chinese factuality abilities of their models and facilitate the growth of LLMs.

Anthology ID:: 2025.acl-long.941
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19182–19208
Language:
URL:: https://aclanthology.org/2025.acl-long.941/
DOI:: 10.18653/v1/2025.acl-long.941
Bibkey:
Cite (ACL):: Yancheng He, Shilong Li, Jiaheng Liu, Yingshui Tan, Weixun Wang, Hui Huang, Xingyuan Bu, Hangyu Guo, Chengwei Hu, Boren Zheng, Zhuoran Lin, Dekai Sun, Zhicheng Zheng, Wenbo Su, and Bo Zheng. 2025. Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 19182–19208, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models (He et al., ACL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.acl-long.941.pdf

PDF Cite Search Fix data