UCL-Bench: A Chinese User-Centric Legal Benchmark for Large Language Models

Ruoli Gan; Duanyu Feng; Chen Zhang; Zhihang Lin; Haochen Jia; Hao Wang (汪浩, 王昊, 王浩); Zhenyang Cai; Lei Cui; Qianqian Xie; Jimin Huang; Benyou Wang

doi:10.18653/v1/2025.findings-naacl.444

UCL-Bench: A Chinese User-Centric Legal Benchmark for Large Language Models

Ruoli Gan, Duanyu Feng, Chen Zhang, Zhihang Lin, Haochen Jia, Hao Wang, Zhenyang Cai, Lei Cui, Qianqian Xie, Jimin Huang, Benyou Wang

Abstract

Existing legal benchmarks focusing on knowledge and logic effectively evaluate LLMs on various tasks in legal domain. However, few have explored the practical application of LLMs by actual users. To further assess whether LLMs meet the specific needs of legal practitioners in real-world scenarios, we introduce UCL-Bench, a Chinese User-Centric Legal Benchmark, comprising 22 tasks across 5 distinct legal scenarios.To build the UCL-Bench, we conduct a user survey targeting legal professionals to understand their needs and challenges. Based on the survey results, we craft tasks, verified by legal professionals, and categorized them according to Bloom’s taxonomy. Each task in UCL-Bench mirrors real-world legal scenarios, and instead of relying on pre-defined answers, legal experts provide detailed answer guidance for each task, incorporating both “information” and “needs” elements to mimic the complexities of legal practice. With the guidance, we use GPT-4 as the user simulator and evaluator, enabling multi-turn dialogues as a answer guidance based evaluation framework. Our findings reveal that many recent open-source general models achieve the highest performance, suggesting that they are well-suited to address the needs of legal practitioners. However, these legal LLMs do not outperform ChatGPT, indicating a need for training strategies aligned with users’ needs. Furthermore, we find that the most effective models are able to address legal issues within fewer dialogue turns, highlighting the importance of concise and accurate responses in achieving high performance. The code and dataset are available at https://github.com/wittenberg11/UCL-bench.

Anthology ID:: 2025.findings-naacl.444
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7945–7988
Language:
URL:: https://aclanthology.org/2025.findings-naacl.444/
DOI:: 10.18653/v1/2025.findings-naacl.444
Bibkey:
Cite (ACL):: Ruoli Gan, Duanyu Feng, Chen Zhang, Zhihang Lin, Haochen Jia, Hao Wang, Zhenyang Cai, Lei Cui, Qianqian Xie, Jimin Huang, and Benyou Wang. 2025. UCL-Bench: A Chinese User-Centric Legal Benchmark for Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 7945–7988, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: UCL-Bench: A Chinese User-Centric Legal Benchmark for Large Language Models (Gan et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.444.pdf

PDF Cite Search Fix data