AsyLex: A Dataset for Legal Language Processing of Refugee Claims

Claire Barale, Mark Klaisoongnoen, Pasquale Minervini, Michael Rovatsos, Nehal Bhuta


Abstract
Advancements in natural language processing (NLP) and language models have demonstrated immense potential in the legal domain, enabling automated analysis and comprehension of legal texts. However, developing robust models in Legal NLP is significantly challenged by the scarcity of resources. This paper presents AsyLex, the first dataset specifically designed for Refugee Law applications to address this gap. The dataset introduces 59,112 documents on refugee status determination in Canada from 1996 to 2022, providing researchers and practitioners with essential material for training and evaluating NLP models for legal research and case review. Case review is defined as entity extraction and outcome prediction tasks. The dataset includes 19,115 gold-standard human-labeled annotations for 20 legally relevant entity types curated with the help of legal experts and 1,682 gold-standard labeled documents for the case outcome. Furthermore, we supply the corresponding trained entity extraction models and the resulting labeled entities generated through the inference process on AsyLex. Four supplementary features are obtained through rule-based extraction. We demonstrate the usefulness of our dataset on the legal judgment prediction task to predict the binary outcome and test a set of baselines using the text of the documents and our annotations. We observe that models pretrained on similar legal documents reach better scores, suggesting that acquiring more datasets for specialized domains such as law is crucial.
Anthology ID:
2023.nllp-1.24
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Daniel Preoțiuc-Pietro, Catalina Goanta, Ilias Chalkidis, Leslie Barrett, Gerasimos (Jerry) Spanakis, Nikolaos Aletras
Venues:
NLLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
244–257
Language:
URL:
https://aclanthology.org/2023.nllp-1.24
DOI:
10.18653/v1/2023.nllp-1.24
Bibkey:
Cite (ACL):
Claire Barale, Mark Klaisoongnoen, Pasquale Minervini, Michael Rovatsos, and Nehal Bhuta. 2023. AsyLex: A Dataset for Legal Language Processing of Refugee Claims. In Proceedings of the Natural Legal Language Processing Workshop 2023, pages 244–257, Singapore. Association for Computational Linguistics.
Cite (Informal):
AsyLex: A Dataset for Legal Language Processing of Refugee Claims (Barale et al., NLLP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nllp-1.24.pdf