Shashank Pandey
2024
LeGen: Complex Information Extraction from Legal sentences using Generative Models
Chaitra C R
|
Sankalp Kulkarni
|
Sai Rama Akash Varma Sagi
|
Shashank Pandey
|
Rohit Yalavarthy
|
Dipanjan Chakraborty
|
Prajna Devi Upadhyay
Proceedings of the Natural Legal Language Processing Workshop 2024
Constructing legal knowledge graphs from unstructured legal texts is a complex challenge due to the intricate nature of legal language. While open information extraction (OIE) techniques can convert text into triples of the form subject, relation, object, they often fall short of capturing the nuanced relationships within lengthy legal sentences, necessitating more sophisticated approaches known as complex information extraction. This paper proposes LeGen – an end-to-end approach leveraging pre-trained large language models (GPT-4o, T5, BART) to perform complex information extraction from legal sentences. LeGen learns and represents the discourse structure of legal sentences, capturing both their complexity and semantics. It minimizes error propagation typical in multi-step pipelines and achieves up to a 32.2% gain on the Indian Legal benchmark. Additionally, it demonstrates competitive performance on open information extraction benchmarks. A promising application of the resulting legal knowledge graphs is in developing question-answering systems for government schemes, tailored to the Next Billion Users who struggle with the complexity of legal language. Our code and data are available at https://github.com/prajnaupadhyay/LegalIE