Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation

Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos


Abstract
We consider the task of Extreme Multi-Label Text Classification (XMTC) in the legal domain. We release a new dataset of 57k legislative documents from EURLEX, the European Union’s public document database, annotated with concepts from EUROVOC, a multidisciplinary thesaurus. The dataset is substantially larger than previous EURLEX datasets and suitable for XMTC, few-shot and zero-shot learning. Experimenting with several neural classifiers, we show that BIGRUs with self-attention outperform the current multi-label state-of-the-art methods, which employ label-wise attention. Replacing CNNs with BIGRUs in label-wise attention networks leads to the best overall performance.
Anthology ID:
W19-2209
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2019
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Nikolaos Aletras, Elliott Ash, Leslie Barrett, Daniel Chen, Adam Meyers, Daniel Preotiuc-Pietro, David Rosenberg, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
78–87
Language:
URL:
https://aclanthology.org/W19-2209
DOI:
10.18653/v1/W19-2209
Bibkey:
Cite (ACL):
Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2019. Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation. In Proceedings of the Natural Legal Language Processing Workshop 2019, pages 78–87, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation (Chalkidis et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-2209.pdf