Semantists at LegalLens-2024: Data-efficient Training of LLM’s for Legal Violation Identification

Kanagasabai Rajaraman, Hariram Veeramani


Abstract
In this paper, we describe our system for LegalLens-2024 Shared Task on automatically identifying legal violations from unstructured text sources. We participate in Subtask B, called Legal Natural Language Inference (L-NLI), that aims to predict the relationship between a given premise summarizing a class action complaint and a hypothesis from an online media text, indicating any association between the review and the complaint. This task is challenging as it provides only limited labelled data. In our work, we adopt LLM based methods and explore various data-efficient learning approaches for maximizing performance. In the end, our best model employed an ensemble of LLM’s fine-tuned on the task-specific data, and achieved a Macro F1 score of 78.5% on test data, and ranked 2nd among all teams submissions.
Anthology ID:
2024.nllp-1.31
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2024
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
355–360
Language:
URL:
https://aclanthology.org/2024.nllp-1.31
DOI:
Bibkey:
Cite (ACL):
Kanagasabai Rajaraman and Hariram Veeramani. 2024. Semantists at LegalLens-2024: Data-efficient Training of LLM’s for Legal Violation Identification. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 355–360, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
Semantists at LegalLens-2024: Data-efficient Training of LLM’s for Legal Violation Identification (Rajaraman & Veeramani, NLLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nllp-1.31.pdf