CourtReasoner: Can LLM Agents Reason Like Judges?

Sophia Simeng Han; Yoshiki Takashima; Shannon Zejiang Shen; Chen Liu; Yixin Liu; Roque K. Thuo; Sonia Knowlton; Ruzica Piskac; Scott J Shapiro; Arman Cohan

doi:10.18653/v1/2025.emnlp-main.1787

CourtReasoner: Can LLM Agents Reason Like Judges?

Sophia Simeng Han, Yoshiki Takashima, Shannon Zejiang Shen, Chen Liu, Yixin Liu, Roque K. Thuo, Sonia Knowlton, Ruzica Piskac, Scott J Shapiro, Arman Cohan

Abstract

LLMs are increasingly applied in the legal domain in tasks such as summarizing legal texts and providing basic legal advice. Yet, their capacity to draft full judicial analyses in U.S. court opinions is still largely uncharted, such as generating entire judicial reasoning sections in U.S. court decisions, remain under-explored. Given the continued adoption of LLMs and the significance of law to society at large, measurement of LLM’s legal reasoning capabilities is a pressing task. We propose CourtReasoner, a novel expert-annotated judicial reasoning benchmark for evaluating LLM agents’ capabilities in complex legal reasoning. Sourcing U.S. court opinions, we construct benchmarks that measure the LLMs ability to construct goal-oriented legal reasoning. CourtReasoner measured the agent’s ability to argue both ways in a legal dispute, rather than simple Q/A. Our results show that more than 60% of frontier model outputs contain invalid arguments and more than 53% of frontier model produced irrelevant citations when conducting complex legal reasoning. We also introduce a meta-evaluation benchmark to provide insights into the capabilities of LLMs as evaluators of legal reasoning. We will release our data, code and full annotation guidelines publicly for future research.

Anthology ID:: 2025.emnlp-main.1787
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35291–35306
Language:
URL:: https://aclanthology.org/2025.emnlp-main.1787/
DOI:: 10.18653/v1/2025.emnlp-main.1787
Bibkey:
Cite (ACL):: Sophia Simeng Han, Yoshiki Takashima, Shannon Zejiang Shen, Chen Liu, Yixin Liu, Roque K. Thuo, Sonia Knowlton, Ruzica Piskac, Scott J Shapiro, and Arman Cohan. 2025. CourtReasoner: Can LLM Agents Reason Like Judges?. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35291–35306, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: CourtReasoner: Can LLM Agents Reason Like Judges? (Han et al., EMNLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.emnlp-main.1787.pdf
Checklist:: 2025.emnlp-main.1787.checklist.pdf

PDF Cite Search Checklist Fix data