BLT: Can Large Language Models Handle Basic Legal Text?

Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme


Abstract
We find that the best publicly available LLMs like GPT-4 and Claude currently perform poorly on basic legal text handling. This motivates the creation of a benchmark consisting of examples that lawyers and paralegals would expect LLMs to handle zero-shot, such as looking up the text at a line of a witness deposition or at a subsection of a contract. LLMs’ poor performance on this benchmark casts into doubt their reliability as-is for legal practice. However, fine-tuning on our training set brings even a small model to near-perfect performance. This benchmark will be useful for fine-tuning LLMs for downstream legal tasks, as well as for tracking LLMs’ reliability as-is for basic legal tasks.
Anthology ID:
2024.nllp-1.18
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2024
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
216–232
Language:
URL:
https://aclanthology.org/2024.nllp-1.18
DOI:
Bibkey:
Cite (ACL):
Andrew Blair-Stanek, Nils Holzenberger, and Benjamin Van Durme. 2024. BLT: Can Large Language Models Handle Basic Legal Text?. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 216–232, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
BLT: Can Large Language Models Handle Basic Legal Text? (Blair-Stanek et al., NLLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nllp-1.18.pdf