2024
pdf
bib
abs
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
|
Xiaoyu Shen
|
Dawei Zhu
|
Fengzhe Zhou
|
Zhuo Han
|
Alan Huang
|
Songyang Zhang
|
Kai Chen
|
Zhixin Yin
|
Zongwen Shen
|
Jidong Ge
|
Vincent Ng
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
We present LawBench, the first evaluation benchmark composed of 20 tasks aimed to assess the ability of Large Language Models (LLMs) to perform Chinese legal-related tasks. LawBench is meticulously crafted to enable precise assessment of LLMs’ legal capabilities from three cognitive levels that correspond to the widely accepted Bloom’s cognitive taxonomy. Using LawBench, we present a comprehensive evaluation of 21 popular LLMs and the first comparative analysis of the empirical results in order to reveal their relative strengths and weaknesses. All data, model predictions and evaluation code are accessible from https://github.com/open-compass/LawBench.
pdf
bib
abs
LJPCheck: Functional Tests for Legal Judgment Prediction
Yuan Zhang
|
Wanhong Huang
|
Yi Feng
|
Chuanyi Li
|
Zhiwei Fei
|
Jidong Ge
|
Bin Luo
|
Vincent Ng
Findings of the Association for Computational Linguistics: ACL 2024
Legal Judgment Prediction (LJP) refers to the task of automatically predicting judgment results (e.g., charges, law articles and term of penalty) given the fact description of cases. While SOTA models have achieved high accuracy and F1 scores on public datasets, existing datasets fail to evaluate specific aspects of these models (e.g., legal fairness, which significantly impact their applications in real scenarios). Inspired by functional testing in software engineering, we introduce LJPCHECK, a suite of functional tests for LJP models, to comprehend LJP models’ behaviors and offer diagnostic insights. We illustrate the utility of LJPCHECK on five SOTA LJP models. Extensive experiments reveal vulnerabilities in these models, prompting an in-depth discussion into the underlying reasons of their shortcomings.
pdf
bib
abs
CMDL: A Large-Scale Chinese Multi-Defendant Legal Judgment Prediction Dataset
Wanhong Huang
|
Yi Feng
|
Chuanyi Li
|
Honghan Wu
|
Jidong Ge
|
Vincent Ng
Findings of the Association for Computational Linguistics: ACL 2024
Legal Judgment Prediction (LJP) has attracted significant attention in recent years. However, previous studies have primarily focused on cases involving only a single defendant, skipping multi-defendant cases due to complexity and difficulty. To advance research, we introduce CMDL, a large-scale real-world Chinese Multi-Defendant LJP dataset, which consists of over 393,945 cases with nearly 1.2 million defendants in total. For performance evaluation, we propose case-level evaluation metrics dedicated for the multi-defendant scenario. Experimental results on CMDL show existing SOTA approaches demonstrate weakness when applied to cases involving multiple defendants. We highlight several challenges that require attention and resolution.
2021
pdf
bib
abs
Don’t Miss the Potential Customers! Retrieving Similar Ads to Improve User Targeting
Yi Feng
|
Ting Wang
|
Chuanyi Li
|
Vincent Ng
|
Jidong Ge
|
Bin Luo
|
Yucheng Hu
|
Xiaopeng Zhang
Findings of the Association for Computational Linguistics: EMNLP 2021
User targeting is an essential task in the modern advertising industry: given a package of ads for a particular category of products (e.g., green tea), identify the online users to whom the ad package should be targeted. A (ad package specific) user targeting model is typically trained using historical clickthrough data: positive instances correspond to users who have clicked on an ad in the package before, whereas negative instances correspond to users who have not clicked on any ads in the package that were displayed to them. Collecting a sufficient amount of positive training data for training an accurate user targeting model, however, is by no means trivial. This paper focuses on the development of a method for automatic augmentation of the set of positive training instances. Experimental results on two datasets, including a real-world company dataset, demonstrate the effectiveness of our proposed method.
2020
pdf
bib
abs
Identifying Exaggerated Language
Li Kong
|
Chuanyi Li
|
Jidong Ge
|
Bin Luo
|
Vincent Ng
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
While hyperbole is one of the most prevalent rhetorical devices, it is arguably one of the least studied devices in the figurative language processing community. We contribute to the study of hyperbole by (1) creating a corpus focusing on sentence-level hyperbole detection, (2) performing a statistical and manual analysis of our corpus, and (3) addressing the automatic hyperbole detection task.