A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction

Ruihao Shui, Yixin Cao, Xiang Wang, Tat-Seng Chua


Abstract
Large language models (LLMs) have demonstrated great potential for domain-specific applications, such as the law domain. However, recent disputes over GPT-4’s law evaluation raise questions concerning their performance in real-world legal tasks. To systematically investigate their competency in the law, we design practical baseline solutions based on LLMs and test on the task of legal judgment prediction. In our solutions, LLMs can work alone to answer open questions or coordinate with an information retrieval (IR) system to learn from similar cases or solve simplified multi-choice questions. We show that similar cases and multi-choice options, namely label candidates, included in prompts can help LLMs recall domain knowledge that is critical for expertise legal reasoning. We additionally present an intriguing paradox wherein an IR system surpasses the performance of LLM+IR due to limited gains acquired by weaker LLMs from powerful IR systems. In such case, the role of LLMs becomes redundant. Our evaluation pipeline can be easily extended into other tasks to facilitate evaluations in other domains. Code is available at https://github.com/srhthu/LM-CompEval-Legal
Anthology ID:
2023.findings-emnlp.490
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7337–7348
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.490
DOI:
10.18653/v1/2023.findings-emnlp.490
Bibkey:
Cite (ACL):
Ruihao Shui, Yixin Cao, Xiang Wang, and Tat-Seng Chua. 2023. A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7337–7348, Singapore. Association for Computational Linguistics.
Cite (Informal):
A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction (Shui et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.490.pdf
Video:
 https://aclanthology.org/2023.findings-emnlp.490.mp4