Combining WordNet and Word Embeddings in Data Augmentation for Legal Texts

Sezen Perçin, Andrea Galassi, Francesca Lagioia, Federico Ruggeri, Piera Santin, Giovanni Sartor, Paolo Torroni


Abstract
Creating balanced labeled textual corpora for complex tasks, like legal analysis, is a challenging and expensive process that often requires the collaboration of domain experts. To address this problem, we propose a data augmentation method based on the combination of GloVe word embeddings and the WordNet ontology. We present an example of application in the legal domain, specifically on decisions of the Court of Justice of the European Union.Our evaluation with human experts confirms that our method is more robust than the alternatives.
Anthology ID:
2022.nllp-1.4
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
47–52
Language:
URL:
https://aclanthology.org/2022.nllp-1.4
DOI:
10.18653/v1/2022.nllp-1.4
Bibkey:
Cite (ACL):
Sezen Perçin, Andrea Galassi, Francesca Lagioia, Federico Ruggeri, Piera Santin, Giovanni Sartor, and Paolo Torroni. 2022. Combining WordNet and Word Embeddings in Data Augmentation for Legal Texts. In Proceedings of the Natural Legal Language Processing Workshop 2022, pages 47–52, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Combining WordNet and Word Embeddings in Data Augmentation for Legal Texts (Perçin et al., NLLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nllp-1.4.pdf