Title Is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws

Lorenzo Bocchi, Alessio Palmero Aprosio


Abstract
Machine Learning and Artificial Intelligence approaches within Public Administration (PA) have grown significantly in recent years. Specifically, new guidelines from various governments recommend employing the EuroVoc thesaurus for the classification of documents issued by the PA.In this paper, we explore some methods to perform document classification in the legal domain, in order to mitigate the length limitation for input texts in BERT models.We first collect data from the European Union, already tagged with the aforementioned taxonomy.Then we reorder the sentences included in the text, with the aim of bringing the most informative part of the document in the first part of the text.Results show that the title and the context are both important, although the order of the text may not.Finally, we release on GitHub both the dataset and the source code used for the experiments.
Anthology ID:
2024.clicit-1.10
Volume:
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
Month:
December
Year:
2024
Address:
Pisa, Italy
Editors:
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni, Rachele Sprugnoli
Venue:
CLiC-it
SIG:
Publisher:
CEUR Workshop Proceedings
Note:
Pages:
74–80
Language:
URL:
https://aclanthology.org/2024.clicit-1.10/
DOI:
Bibkey:
Cite (ACL):
Lorenzo Bocchi and Alessio Palmero Aprosio. 2024. Title Is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws. In Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024), pages 74–80, Pisa, Italy. CEUR Workshop Proceedings.
Cite (Informal):
Title Is (Not) All You Need for EuroVoc Multi-Label Classification of European Laws (Bocchi & Palmero Aprosio, CLiC-it 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.clicit-1.10.pdf