Automatic sentence segmentation of clinical record narratives in real-world data

Dongfang Xu; Davy Weissenbacher; Karen O’Connor; Siddharth Rawal; Graciela Hernandez

Automatic sentence segmentation of clinical record narratives in real-world data

Dongfang Xu, Davy Weissenbacher, Karen O’Connor, Siddharth Rawal, Graciela Hernandez

Abstract

Sentence segmentation is a linguistic task and is widely used as a pre-processing step in many NLP applications. The need for sentence segmentation is particularly pronounced in clinical notes, where ungrammatical and fragmented texts are common. We propose a straightforward and effective sequence labeling classifier to predict sentence spans using a dynamic sliding window based on the prediction of each input sequence. This sliding window algorithm allows our approach to segment long text sequences on the fly. To evaluate our approach, we annotated 90 clinical notes from the MIMIC-III dataset. Additionally, we tested our approach on five other datasets to assess its generalizability and compared its performance against state-of-the-art systems on these datasets. Our approach outperformed all the systems, achieving an F1 score that is 15% higher than the next best-performing system on the clinical dataset.

Anthology ID:: 2024.emnlp-main.1156
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20780–20793
Language:
URL:: https://aclanthology.org/2024.emnlp-main.1156
DOI:
Bibkey:
Cite (ACL):: Dongfang Xu, Davy Weissenbacher, Karen O’Connor, Siddharth Rawal, and Graciela Hernandez. 2024. Automatic sentence segmentation of clinical record narratives in real-world data. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20780–20793, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Automatic sentence segmentation of clinical record narratives in real-world data (Xu et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.1156.pdf

PDF Cite Search