Exploration of Open Large Language Models for eDiscovery

Sumit Pai, Sounak Lahiri, Ujjwal Kumar, Krishanu Baksi, Elijah Soba, Michael Suesserman, Nirmala Pudota, Jon Foster, Edward Bowen, Sanmitra Bhattacharya


Abstract
The rapid advancement of Generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), has led to their widespread adoption for various natural language processing (NLP) tasks. One crucial domain ripe for innovation is the Technology-Assisted Review (TAR) process in Electronic discovery (eDiscovery). Traditionally, TAR involves manual review and classification of documents for relevance over large document collections for litigations and investigations. This process is aided by machine learning and NLP tools which require extensive training and fine-tuning. In this paper, we explore the application of LLMs to TAR, specifically for predictive coding. We experiment with out-of-the-box prompting and fine-tuning of LLMs using parameter-efficient techniques. We conduct experiments using open LLMs and compare them to commercially-licensed ones. Our experiments demonstrate that open LLMs lag behind commercially-licensed models in relevance classification using out-of-the-box prompting. However, topic-specific instruction tuning of open LLMs not only improve their effectiveness but can often outperform their commercially-licensed counterparts in performance evaluations. Additionally, we conduct a user study to gauge the preferences of our eDiscovery Subject Matter Specialists (SMS) regarding human-authored versus model-generated reasoning. We demonstrate that instruction-tuned open LLMs can generate high quality reasonings that are comparable to commercial LLMs.
Anthology ID:
2023.nllp-1.17
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Daniel Preoțiuc-Pietro, Catalina Goanta, Ilias Chalkidis, Leslie Barrett, Gerasimos Spanakis, Nikolaos Aletras
Venues:
NLLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
166–177
Language:
URL:
https://aclanthology.org/2023.nllp-1.17
DOI:
10.18653/v1/2023.nllp-1.17
Bibkey:
Cite (ACL):
Sumit Pai, Sounak Lahiri, Ujjwal Kumar, Krishanu Baksi, Elijah Soba, Michael Suesserman, Nirmala Pudota, Jon Foster, Edward Bowen, and Sanmitra Bhattacharya. 2023. Exploration of Open Large Language Models for eDiscovery. In Proceedings of the Natural Legal Language Processing Workshop 2023, pages 166–177, Singapore. Association for Computational Linguistics.
Cite (Informal):
Exploration of Open Large Language Models for eDiscovery (Pai et al., NLLP-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nllp-1.17.pdf