Not ready for the bench: LLM legal interpretation is unstable and uncalibrated to human judgments

Abhishek Purushothama; Junghyun Min; Brandon Waldon; Nathan Schneider

doi:10.18653/v1/2025.nllp-1.22

Not ready for the bench: LLM legal interpretation is unstable and uncalibrated to human judgments

Abhishek Purushothama, Junghyun Min, Brandon Waldon, Nathan Schneider

Abstract

Legal interpretation frequently involves assessing how a legal text, as understood by an ‘ordinary’ speaker of the language, applies to the set of facts characterizing a legal dispute. Recent scholarship has proposed that legal practitioners add large language models (LLMs) to their interpretive toolkit. This work offers an empirical argument against LLM-assisted interpretation as recently practiced by legal scholars and federal judges. Our investigation in English shows that models do not provide stable interpretive judgments and are susceptible to subtle variations in the prompt. While instruction tuning slightly improves model calibration to human judgments, even the best-calibrated LLMs remain weak predictors of human native speakers’ judgments.

Anthology ID:: 2025.nllp-1.22
Volume:: Proceedings of the Natural Legal Language Processing Workshop 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venues:: NLLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 317
Language:
URL:: https://aclanthology.org/2025.nllp-1.22/
DOI:: 10.18653/v1/2025.nllp-1.22
Bibkey:
Cite (ACL):: Abhishek Purushothama, Junghyun Min, Brandon Waldon, and Nathan Schneider. 2025. Not ready for the bench: LLM legal interpretation is unstable and uncalibrated to human judgments. In Proceedings of the Natural Legal Language Processing Workshop 2025, pages 317–317, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Not ready for the bench: LLM legal interpretation is unstable and uncalibrated to human judgments (Purushothama et al., NLLP 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.nllp-1.22.pdf

PDF Cite Search Fix data