Efficient Deep Learning-based Sentence Boundary Detection in Legal Text

Reshma Sheik, Gokul T, S Nirmala


Abstract
A key component of the Natural Language Processing (NLP) pipeline is Sentence Boundary Detection (SBD). Erroneous SBD could affect other processing steps and reduce performance. A few criteria based on punctuation and capitalization are necessary to identify sentence borders in well-defined corpora. However, due to several grammatical ambiguities, the complex structure of legal data poses difficulties for SBD. In this paper, we have trained a neural network framework for identifying the end of the sentence in legal text. We used several state-of-the-art deep learning models, analyzed their performance, and identified that Convolutional Neural Network(CNN) outperformed other deep learning frameworks. We compared the results with rule-based, statistical, and transformer-based frameworks. The best neural network model outscored the popular rule-based framework with an improvement of 8% in the F1 score. Although domain-specific statistical models have slightly improved performance, the trained CNN is 80 times faster in run-time and doesn’t require much feature engineering. Furthermore, after extensive pretraining, the transformer models fall short in overall performance compared to the best deep learning model.
Anthology ID:
2022.nllp-1.18
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2022
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
208–217
Language:
URL:
https://aclanthology.org/2022.nllp-1.18
DOI:
10.18653/v1/2022.nllp-1.18
Bibkey:
Cite (ACL):
Reshma Sheik, Gokul T, and S Nirmala. 2022. Efficient Deep Learning-based Sentence Boundary Detection in Legal Text. In Proceedings of the Natural Legal Language Processing Workshop 2022, pages 208–217, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Efficient Deep Learning-based Sentence Boundary Detection in Legal Text (Sheik et al., NLLP 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.nllp-1.18.pdf
Video:
 https://aclanthology.org/2022.nllp-1.18.mp4