ILSIC: Corpora for Identifying Indian Legal Statutes from Queries by Laymen

Shounak Paul; Raghav Dogra; Pawan Goyal; Saptarshi Ghosh

ILSIC: Corpora for Identifying Indian Legal Statutes from Queries by Laymen

Shounak Paul, Raghav Dogra, Pawan Goyal, Saptarshi Ghosh

Abstract

Legal Statute Identification (LSI) for a given situation is one of the most fundamental tasks in Legal NLP. This task has traditionally been modeled using facts from court judgments as input queries, due to their abundance. However, in practical settings, the input queries are likely to be informal and asked by laypersons, or non-professionals. While a few laypeople LSI datasets exist, there has been little research to explore the differences between court and laypeople data for LSI. In this work, we create ILSIC, a corpus of laypeople queries covering 500+ statutes from Indian law. Additionally, the corpus also contains court case judgements to enable researchers to effectively compare between court and laypeople data for LSI. We conducted extensive experiments on our corpus, including benchmarking over the laypeople dataset using zero and few-shot inference, retrieval-augmented generation and supervised fine-tuning. We observe that models trained purely on court judgements are ineffective during test on laypeople queries, while transfer learning from court to laypeople data can be beneficial in certain scenarios. We also conducted fine-grained analyses of our results in terms of categories of queries and frequency of statutes.

Anthology ID:: 2026.findings-eacl.354
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6725–6746
Language:
URL:: https://aclanthology.org/2026.findings-eacl.354/
DOI:
Bibkey:
Cite (ACL):: Shounak Paul, Raghav Dogra, Pawan Goyal, and Saptarshi Ghosh. 2026. ILSIC: Corpora for Identifying Indian Legal Statutes from Queries by Laymen. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6725–6746, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: ILSIC: Corpora for Identifying Indian Legal Statutes from Queries by Laymen (Paul et al., Findings 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.findings-eacl.354.pdf
Checklist:: 2026.findings-eacl.354.checklist.pdf

PDF Cite Search Checklist Fix data