BLT: Can Large Language Models Handle Basic Legal Text?

Andrew Blair-Stanek; Nils Holzenberger; Benjamin Van Durme

doi:10.18653/v1/2024.nllp-1.18

BLT: Can Large Language Models Handle Basic Legal Text?

Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

We find that the best publicly available LLMs like GPT-4 and Claude currently perform poorly on basic legal text handling. This motivates the creation of a benchmark consisting of examples that lawyers and paralegals would expect LLMs to handle zero-shot, such as looking up the text at a line of a witness deposition or at a subsection of a contract. LLMs’ poor performance on this benchmark casts into doubt their reliability as-is for legal practice. However, fine-tuning on our training set brings even a small model to near-perfect performance. This benchmark will be useful for fine-tuning LLMs for downstream legal tasks, as well as for tracking LLMs’ reliability as-is for basic legal tasks.

Anthology ID:: 2024.nllp-1.18
Volume:: Proceedings of the Natural Legal Language Processing Workshop 2024
Month:: November
Year:: 2024
Address:: Miami, FL, USA
Editors:: Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venues:: NLLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 216–232
Language:
URL:: https://aclanthology.org/2024.nllp-1.18/
DOI:: 10.18653/v1/2024.nllp-1.18
Bibkey:
Cite (ACL):: Andrew Blair-Stanek, Nils Holzenberger, and Benjamin Van Durme. 2024. BLT: Can Large Language Models Handle Basic Legal Text?. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 216–232, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):: BLT: Can Large Language Models Handle Basic Legal Text? (Blair-Stanek et al., NLLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.nllp-1.18.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{blair-stanek-etal-2024-blt,
    title = "{BLT}: Can Large Language Models Handle Basic Legal Text?",
    author = "Blair-Stanek, Andrew  and
      Holzenberger, Nils  and
      Van Durme, Benjamin",
    editor = "Aletras, Nikolaos  and
      Chalkidis, Ilias  and
      Barrett, Leslie  and
      Goanț{\u{a}}, C{\u{a}}t{\u{a}}lina  and
      Preoțiuc-Pietro, Daniel  and
      Spanakis, Gerasimos",
    booktitle = "Proceedings of the Natural Legal Language Processing Workshop 2024",
    month = nov,
    year = "2024",
    address = "Miami, FL, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.nllp-1.18/",
    doi = "10.18653/v1/2024.nllp-1.18",
    pages = "216--232",
    abstract = "We find that the best publicly available LLMs like GPT-4 and Claude currently perform poorly on basic legal text handling. This motivates the creation of a benchmark consisting of examples that lawyers and paralegals would expect LLMs to handle zero-shot, such as looking up the text at a line of a witness deposition or at a subsection of a contract. LLMs' poor performance on this benchmark casts into doubt their reliability as-is for legal practice. However, fine-tuning on our training set brings even a small model to near-perfect performance. This benchmark will be useful for fine-tuning LLMs for downstream legal tasks, as well as for tracking LLMs' reliability as-is for basic legal tasks."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="blair-stanek-etal-2024-blt">
    <titleInfo>
        <title>BLT: Can Large Language Models Handle Basic Legal Text?</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Andrew</namePart>
        <namePart type="family">Blair-Stanek</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Nils</namePart>
        <namePart type="family">Holzenberger</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Benjamin</namePart>
        <namePart type="family">Van Durme</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2024-11</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the Natural Legal Language Processing Workshop 2024</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Nikolaos</namePart>
            <namePart type="family">Aletras</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ilias</namePart>
            <namePart type="family">Chalkidis</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Leslie</namePart>
            <namePart type="family">Barrett</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Cătălina</namePart>
            <namePart type="family">Goanță</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Daniel</namePart>
            <namePart type="family">Preoțiuc-Pietro</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Gerasimos</namePart>
            <namePart type="family">Spanakis</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Miami, FL, USA</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>We find that the best publicly available LLMs like GPT-4 and Claude currently perform poorly on basic legal text handling. This motivates the creation of a benchmark consisting of examples that lawyers and paralegals would expect LLMs to handle zero-shot, such as looking up the text at a line of a witness deposition or at a subsection of a contract. LLMs’ poor performance on this benchmark casts into doubt their reliability as-is for legal practice. However, fine-tuning on our training set brings even a small model to near-perfect performance. This benchmark will be useful for fine-tuning LLMs for downstream legal tasks, as well as for tracking LLMs’ reliability as-is for basic legal tasks.</abstract>
    <identifier type="citekey">blair-stanek-etal-2024-blt</identifier>
    <identifier type="doi">10.18653/v1/2024.nllp-1.18</identifier>
    <location>
        <url>https://aclanthology.org/2024.nllp-1.18/</url>
    </location>
    <part>
        <date>2024-11</date>
        <extent unit="page">
            <start>216</start>
            <end>232</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T BLT: Can Large Language Models Handle Basic Legal Text?
%A Blair-Stanek, Andrew
%A Holzenberger, Nils
%A Van Durme, Benjamin
%Y Aletras, Nikolaos
%Y Chalkidis, Ilias
%Y Barrett, Leslie
%Y Goanță, Cătălina
%Y Preoțiuc-Pietro, Daniel
%Y Spanakis, Gerasimos
%S Proceedings of the Natural Legal Language Processing Workshop 2024
%D 2024
%8 November
%I Association for Computational Linguistics
%C Miami, FL, USA
%F blair-stanek-etal-2024-blt
%X We find that the best publicly available LLMs like GPT-4 and Claude currently perform poorly on basic legal text handling. This motivates the creation of a benchmark consisting of examples that lawyers and paralegals would expect LLMs to handle zero-shot, such as looking up the text at a line of a witness deposition or at a subsection of a contract. LLMs’ poor performance on this benchmark casts into doubt their reliability as-is for legal practice. However, fine-tuning on our training set brings even a small model to near-perfect performance. This benchmark will be useful for fine-tuning LLMs for downstream legal tasks, as well as for tracking LLMs’ reliability as-is for basic legal tasks.
%R 10.18653/v1/2024.nllp-1.18
%U https://aclanthology.org/2024.nllp-1.18/
%U https://doi.org/10.18653/v1/2024.nllp-1.18
%P 216-232

Download as File

Markdown (Informal)

[BLT: Can Large Language Models Handle Basic Legal Text?](https://aclanthology.org/2024.nllp-1.18/) (Blair-Stanek et al., NLLP 2024)

BLT: Can Large Language Models Handle Basic Legal Text? (Blair-Stanek et al., NLLP 2024)

ACL

Andrew Blair-Stanek, Nils Holzenberger, and Benjamin Van Durme. 2024. BLT: Can Large Language Models Handle Basic Legal Text?. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 216–232, Miami, FL, USA. Association for Computational Linguistics.