ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation

Siying Zhou; Yiquan Wu; Hui Chen; Xueyu Hu; Kun Kuang; Adam Jatowt; Chunyan Zheng; Fei Wu

doi:10.18653/v1/2025.findings-emnlp.658

ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation

Siying Zhou, Yiquan Wu, Hui Chen, Xueyu Hu, Kun Kuang, Adam Jatowt, Chunyan Zheng, Fei Wu

Abstract

Legal claims refer to the plaintiff’s demands in a case and are essential to guiding judicial reasoning and case resolution. While many works have focused on improving the efficiency of legal professionals, the research on helping non-professionals (e.g., plaintiffs) remains unexplored. This paper explores the problem of legal claim generation based on the given case’s facts. First, we construct ClaimGen-CN, the first dataset for Chinese legal claim generation task, from various real-world legal disputes. Additionally, we design an evaluation metric tailored for assessing the generated claims, which encompasses two essential dimensions: factuality and clarity. Building on this, we conduct a comprehensive zero-shot evaluation of state-of-the-art general and legal-domain large language models. Our findings highlight the limitations of the current models in factual precision and expressive clarity, pointing to the need for more targeted development in this domain. To encourage further exploration of this important task, we will make the dataset publicly available.

Anthology ID:: 2025.findings-emnlp.658
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12296–12323
Language:
URL:: https://aclanthology.org/2025.findings-emnlp.658/
DOI:: 10.18653/v1/2025.findings-emnlp.658
Bibkey:
Cite (ACL):: Siying Zhou, Yiquan Wu, Hui Chen, Xueyu Hu, Kun Kuang, Adam Jatowt, Chunyan Zheng, and Fei Wu. 2025. ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 12296–12323, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation (Zhou et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-emnlp.658.pdf
Checklist:: 2025.findings-emnlp.658.checklist.pdf

PDF Cite Search Checklist Fix data