LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification

Shubham Kumar Nigam; Tanmay Dubey; Govind Sharma; Noel Shallum; Kripabandhu Ghosh; Arnab Bhattacharya

doi:10.18653/v1/2025.findings-naacl.63

LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification

Shubham Kumar Nigam, Tanmay Dubey, Govind Sharma, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya

Abstract

In this paper, we address the task of semantic segmentation of legal documents through rhetorical role classification, with a focus on Indian legal judgments. We introduce **LegalSeg**, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. To benchmark performance, we evaluate multiple state-of-the-art models, including Hierarchical BiLSTM-CRF, TransformerOverInLegalBERT (ToInLegalBERT), Graph Neural Networks (GNNs), and Role-Aware Transformers, alongside an exploratory **RhetoricLLaMA**, an instruction-tuned large language model. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features. Additionally, we conducted experiments using surrounding context and predicted or actual labels of neighboring sentences to assess their impact on classification accuracy. Despite these advancements, challenges persist in distinguishing between closely related roles and addressing class imbalance. Our work underscores the potential of advanced techniques for improving legal document understanding and sets a strong foundation for future research in legal NLP.

Anthology ID:: 2025.findings-naacl.63
Volume:: Findings of the Association for Computational Linguistics: NAACL 2025
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1129–1144
Language:
URL:: https://aclanthology.org/2025.findings-naacl.63/
DOI:: 10.18653/v1/2025.findings-naacl.63
Bibkey:
Cite (ACL):: Shubham Kumar Nigam, Tanmay Dubey, Govind Sharma, Noel Shallum, Kripabandhu Ghosh, and Arnab Bhattacharya. 2025. LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 1129–1144, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification (Nigam et al., Findings 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.findings-naacl.63.pdf

PDF Cite Search Fix data