KEVLAR: The Complete Resource for EuroVoc Classification of Legal Documents

Lorenzo Bocchi, Camilla Casula, Alessio Palmero Aprosio


Abstract
The use of Machine Learning and Artificial Intelligence in the Public Administration (PA) has increased in the last years. In particular, recent guidelines proposed by various governments for the classification of documents released by the PA suggest to use the EuroVoc thesaurus. In this paper, we present KEVLAR, an all-in-one solution for performing the above-mentioned task on acts belonging to the Public Administration. First, we create a collection of 8 million documents in 24 languages, tagged with EuroVoc labels, taken from EUR-Lex, the web portal of the European Union legislation. Then, we train different pre-trained BERT-based models, comparing the performance of base models with domain-specific and multilingual ones. We release the corpus, the best-performing models, and a Docker image containing the source code of the trainer, the REST API, and the web interface. This image can be employed out-of-the-box for document classification.
Anthology ID:
2024.clicit-1.9
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
66–73
Language:
URL:
https://aclanthology.org/2024.clicit-1.9/
DOI:
Bibkey:
Cite (ACL):
Lorenzo Bocchi, Camilla Casula, and Alessio Palmero Aprosio. 2024. KEVLAR: The Complete Resource for EuroVoc Classification of Legal Documents. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 66–73, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
KEVLAR: The Complete Resource for EuroVoc Classification of Legal Documents (Bocchi et al., CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.9.pdf