LeGen: Complex Information Extraction from Legal sentences using Generative Models

Chaitra C R, Sankalp Kulkarni, Sai Rama Akash Varma Sagi, Shashank Pandey, Rohit Yalavarthy, Dipanjan Chakraborty, Prajna Devi Upadhyay


Abstract
Constructing legal knowledge graphs from unstructured legal texts is a complex challenge due to the intricate nature of legal language. While open information extraction (OIE) techniques can convert text into triples of the form subject, relation, object, they often fall short of capturing the nuanced relationships within lengthy legal sentences, necessitating more sophisticated approaches known as complex information extraction. This paper proposes LeGen – an end-to-end approach leveraging pre-trained large language models (GPT-4o, T5, BART) to perform complex information extraction from legal sentences. LeGen learns and represents the discourse structure of legal sentences, capturing both their complexity and semantics. It minimizes error propagation typical in multi-step pipelines and achieves up to a 32.2% gain on the Indian Legal benchmark. Additionally, it demonstrates competitive performance on open information extraction benchmarks. A promising application of the resulting legal knowledge graphs is in developing question-answering systems for government schemes, tailored to the Next Billion Users who struggle with the complexity of legal language. Our code and data are available at https://github.com/prajnaupadhyay/LegalIE
Anthology ID:
2024.nllp-1.1
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2024
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–17
Language:
URL:
https://aclanthology.org/2024.nllp-1.1
DOI:
Bibkey:
Cite (ACL):
Chaitra C R, Sankalp Kulkarni, Sai Rama Akash Varma Sagi, Shashank Pandey, Rohit Yalavarthy, Dipanjan Chakraborty, and Prajna Devi Upadhyay. 2024. LeGen: Complex Information Extraction from Legal sentences using Generative Models. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 1–17, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
LeGen: Complex Information Extraction from Legal sentences using Generative Models (C R et al., NLLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nllp-1.1.pdf