Wanhong Huang
2024
LJPCheck: Functional Tests for Legal Judgment Prediction
Yuan Zhang
|
Wanhong Huang
|
Yi Feng
|
Chuanyi Li
|
Zhiwei Fei
|
Jidong Ge
|
Bin Luo
|
Vincent Ng
Findings of the Association for Computational Linguistics: ACL 2024
Legal Judgment Prediction (LJP) refers to the task of automatically predicting judgment results (e.g., charges, law articles and term of penalty) given the fact description of cases. While SOTA models have achieved high accuracy and F1 scores on public datasets, existing datasets fail to evaluate specific aspects of these models (e.g., legal fairness, which significantly impact their applications in real scenarios). Inspired by functional testing in software engineering, we introduce LJPCHECK, a suite of functional tests for LJP models, to comprehend LJP models’ behaviors and offer diagnostic insights. We illustrate the utility of LJPCHECK on five SOTA LJP models. Extensive experiments reveal vulnerabilities in these models, prompting an in-depth discussion into the underlying reasons of their shortcomings.
CMDL: A Large-Scale Chinese Multi-Defendant Legal Judgment Prediction Dataset
Wanhong Huang
|
Yi Feng
|
Chuanyi Li
|
Honghan Wu
|
Jidong Ge
|
Vincent Ng
Findings of the Association for Computational Linguistics: ACL 2024
Legal Judgment Prediction (LJP) has attracted significant attention in recent years. However, previous studies have primarily focused on cases involving only a single defendant, skipping multi-defendant cases due to complexity and difficulty. To advance research, we introduce CMDL, a large-scale real-world Chinese Multi-Defendant LJP dataset, which consists of over 393,945 cases with nearly 1.2 million defendants in total. For performance evaluation, we propose case-level evaluation metrics dedicated for the multi-defendant scenario. Experimental results on CMDL show existing SOTA approaches demonstrate weakness when applied to cases involving multiple defendants. We highlight several challenges that require attention and resolution.