Information Extraction for Planning Court Cases

Drish Mali, Rubash Mali, Claire Barale


Abstract
Legal documents are often long and unstructured, making them challenging and time-consuming to apprehend. An automatic system that can identify relevant entities and labels within legal documents, would significantly reduce the legal research time. We developed a system to streamline legal case analysis from planning courts by extracting key information from XML files using Named Entity Recognition (NER) and multi-label classification models to convert them into structured form. This research contributes three novel datasets for the Planning Court cases: a NER dataset, a multi-label dataset fully annotated by humans, and newly re-annotated multi-label datasets partially annotated using LLMs. We experimented with various general-purpose and legal domain-specific models with different maximum sequence lengths. It was noted that incorporating paragraph position information improved the performance of models for the multi-label classification task. Our research highlighted the importance of domain-specific models, with LegalRoBERTa and LexLM demonstrating the best performance.
Anthology ID:
2024.nllp-1.8
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2024
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
97–114
Language:
URL:
https://aclanthology.org/2024.nllp-1.8
DOI:
Bibkey:
Cite (ACL):
Drish Mali, Rubash Mali, and Claire Barale. 2024. Information Extraction for Planning Court Cases. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 97–114, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
Information Extraction for Planning Court Cases (Mali et al., NLLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nllp-1.8.pdf