LLM-based Embedders for Prior Case Retrieval

Damith Premasiri, Tharindu Ranasinghe, Ruslan Mitkov


Abstract
In common law systems, legal professionals such as lawyers and judges rely on precedents to build their arguments. As the volume of cases has grown massively over time, effectively retrieving prior cases has become essential. Prior case retrieval (PCR) is an information retrieval (IR) task that aims to automatically identify the most relevant court cases for a specific query from a large pool of potential candidates. While IR methods have seen several paradigm shifts over the last few years, the vast majority of PCR methods continue to rely on traditional IR methods, such as BM25. The state-of-the-art deep learning IR methods have not been successful in PCR due to two key challenges: i. Lengthy legal text limitation; when using the powerful BERT-based transformer models, there is a limit of input text lengths, which inevitably requires to shorten the input via truncation or division with a loss of legal context information. ii. Lack of legal training data; due to data privacy concerns, available PCR datasets are often limited in size, making it difficult to train deep learning-based models effectively. In this research, we address these challenges by leveraging LLM-based text embedders in PCR. LLM-based embedders support longer input lengths, and since we use them in an unsupervised manner, they do not require training data, addressing both challenges simultaneously. In this paper, we evaluate state-of-the-art LLM-based text embedders in four PCR benchmark datasets and show that they outperform BM25 and supervised transformer-based models.
Anthology ID:
2025.ranlp-1.112
Volume:
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Galia Angelova, Maria Kunilovskaya, Marie Escribe, Ruslan Mitkov
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
980–988
Language:
URL:
https://aclanthology.org/2025.ranlp-1.112/
DOI:
Bibkey:
Cite (ACL):
Damith Premasiri, Tharindu Ranasinghe, and Ruslan Mitkov. 2025. LLM-based Embedders for Prior Case Retrieval. In Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era, pages 980–988, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
LLM-based Embedders for Prior Case Retrieval (Premasiri et al., RANLP 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.ranlp-1.112.pdf