Cross Examine: An Ensemble-based approach to leverage Large Language Models for Legal Text Analytics

Saurav Chowdhury, Lipika Dey, Suyog Joshi


Abstract
Legal documents are complex in nature, describing a course of argumentative reasoning that is followed to settle a case. Churning through large volumes of legal documents is a daily requirement for a large number of professionals who need access to the information embedded in them. Natural language processing methods that help in document summarization with key information components, insight extraction and question answering play a crucial role in legal text processing. Most of the existing document analysis systems use supervised machine learning, which require large volumes of annotated training data for every different application and are expensive to build. In this paper we propose a legal text analytics pipeline using Large Language Models (LLM), which can work with little or no training data. For document summarization, we propose an iterative pipeline using retrieval augmented generation to ensure that the generated text remains contextually relevant. For question answering, we propose a novel ontology-driven ensemble approach similar to cross-examination that exploits questioning and verification principles. A knowledge graph, created with the extracted information, stores the key entities and relationships reflecting the repository content structure. A new dataset is created with Indian court documents related to bail applications for cases filed under Protection of Children from Sexual Offences (POCSO) Act, 2012 an Indian law to protect children from sexual abuse and offences. Analysis of insights extracted from the answers reveal patterns of crime and social conditions leading to those crimes, which are important inputs for social scientists as well as legal system.
Anthology ID:
2024.nllp-1.16
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2024
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
194–204
Language:
URL:
https://aclanthology.org/2024.nllp-1.16
DOI:
Bibkey:
Cite (ACL):
Saurav Chowdhury, Lipika Dey, and Suyog Joshi. 2024. Cross Examine: An Ensemble-based approach to leverage Large Language Models for Legal Text Analytics. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 194–204, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
Cross Examine: An Ensemble-based approach to leverage Large Language Models for Legal Text Analytics (Chowdhury et al., NLLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nllp-1.16.pdf