LEGAL-BERT: The Muppets straight out of Law School

Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos


Abstract
BERT has achieved impressive performance in several NLP tasks. However, there has been limited investigation on its adaptation guidelines in specialised domains. Here we focus on the legal domain, where we explore several approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets. Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain. Thus we propose a systematic investigation of the available strategies when applying BERT in specialised domains. These are: (a) use the original BERT out of the box, (b) adapt BERT by additional pre-training on domain-specific corpora, and (c) pre-train BERT from scratch on domain-specific corpora. We also propose a broader hyper-parameter search space when fine-tuning for downstream tasks and we release LEGAL-BERT, a family of BERT models intended to assist legal NLP research, computational law, and legal technology applications.
Anthology ID:
2020.findings-emnlp.261
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2898–2904
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.261
DOI:
10.18653/v1/2020.findings-emnlp.261
Bibkey:
Cite (ACL):
Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2020. LEGAL-BERT: The Muppets straight out of Law School. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2898–2904, Online. Association for Computational Linguistics.
Cite (Informal):
LEGAL-BERT: The Muppets straight out of Law School (Chalkidis et al., Findings 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.findings-emnlp.261.pdf
Optional supplementary material:
 2020.findings-emnlp.261.OptionalSupplementaryMaterial.pdf
Data
ECHREURLEX57K