LAiW: A Chinese Legal Large Language Models Benchmark

Yongfu Dai; Duanyu Feng; Jimin Huang; Haochen Jia; Qianqian Xie; Yifang Zhang; Weiguang Han; Wei Tian; Hao Wang

LAiW: A Chinese Legal Large Language Models Benchmark

Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie, Yifang Zhang, Weiguang Han, Wei Tian, Hao Wang

Abstract

General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI. However, their current evaluations lack alignment with the fundamental logic of legal reasoning, the legal syllogism. This hinders trust and understanding from legal experts. To bridge this gap, we introduce LAiW, the Chinese legal LLM benchmark structured around the legal syllogism. We evaluate legal LLMs across three levels of capability, each reflecting a progressively more complex stage of legal syllogism: fundamental information retrieval, legal principles inference, and advanced legal applications, and encompassing a wide range of tasks in different legal scenarios. Our automatic evaluation reveals that LLMs, despite their ability to answer complex legal questions, lack the inherent logical processes of the legal syllogism. This limitation poses a barrier to acceptance by legal professionals. Furthermore, manual evaluation with legal experts confirms this issue and highlights the importance of pre-training on legal text to enhance the legal syllogism of LLMs. Future research may prioritize addressing this gap to unlock the full potential of LLMs in legal applications.

Anthology ID:: 2025.coling-main.716
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10738–10766
Language:
URL:: https://aclanthology.org/2025.coling-main.716/
DOI:
Bibkey:
Cite (ACL):: Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie, Yifang Zhang, Weiguang Han, Wei Tian, and Hao Wang. 2025. LAiW: A Chinese Legal Large Language Models Benchmark. In Proceedings of the 31st International Conference on Computational Linguistics, pages 10738–10766, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: LAiW: A Chinese Legal Large Language Models Benchmark (Dai et al., COLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.coling-main.716.pdf

PDF Cite Search Fix data