Yifang Zhang
2025
LAiW: A Chinese Legal Large Language Models Benchmark
Yongfu Dai
|
Duanyu Feng
|
Jimin Huang
|
Haochen Jia
|
Qianqian Xie
|
Yifang Zhang
|
Weiguang Han
|
Wei Tian
|
Hao Wang
Proceedings of the 31st International Conference on Computational Linguistics
General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI. However, their current evaluations lack alignment with the fundamental logic of legal reasoning, the legal syllogism. This hinders trust and understanding from legal experts. To bridge this gap, we introduce LAiW, the Chinese legal LLM benchmark structured around the legal syllogism. We evaluate legal LLMs across three levels of capability, each reflecting a progressively more complex stage of legal syllogism: fundamental information retrieval, legal principles inference, and advanced legal applications, and encompassing a wide range of tasks in different legal scenarios. Our automatic evaluation reveals that LLMs, despite their ability to answer complex legal questions, lack the inherent logical processes of the legal syllogism. This limitation poses a barrier to acceptance by legal professionals. Furthermore, manual evaluation with legal experts confirms this issue and highlights the importance of pre-training on legal text to enhance the legal syllogism of LLMs. Future research may prioritize addressing this gap to unlock the full potential of LLMs in legal applications.
Search
Fix data
Co-authors
- Yongfu Dai 1
- Duanyu Feng 1
- Weiguang Han 1
- Jimin Huang 1
- Haochen Jia 1
- show all...